[GitHub] carbondata pull request #2784: [CARBONDATA-2987] Data mismatch after compact...

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2784: [CARBONDATA-2987] Data mismatch after compact...

qiuchenjian-2
GitHub user ajantha-bhat opened a pull request:

    https://github.com/apache/carbondata/pull/2784

    [CARBONDATA-2987] Data mismatch after compaction with measure sort columns

    problem: Data mismatch after compaction with measure sort columns
   
    root cause : In compaction flow (DictionaryBasedResultCollector), in ColumnPageWrapper inverted index mapping is not handled. Because of this row ID was wrong, row of no dictionary dimension columns gets data from other rows.
    Hence the data mismatch.
     
    solution: Handle inverted index mapping for  DictionaryBasedResultCollector flow in ColumnPageWrapper
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed? No
     
     - [ ] Any backward compatibility impacted? No
     
     - [ ] Document update required?NA
   
     - [ ] Testing done
         done. updated UT      
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajantha-bhat/carbondata master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2784.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2784
   
----
commit 63888753f72d7c6b4d993b4e31f3c5a8b7d449f8
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-09-28T10:57:55Z

    [CARBONDATA-2987] Data mismatch after compaction with measure sort columns

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2784
 
    Is this issue also exist in older version like 1.3 and 1.4?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2784
 
    @jackylk : No this issue introduce only in this version (1.5) after adaptive encoding of primitive type changes.
   
    Previously column page wrapper was only for complex columns, this version it is used for no dictionary columns also. In vector reader case (DictionaryBasedVectorResultCollector) inverted index was handled for no-dictionary columns here but for DictionaryBasedResultCollector it was not handled here. Hence the issue.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2784
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/630/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2784
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/822/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2784
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8891/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2784: [CARBONDATA-2987] Data mismatch after compact...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2784#discussion_r221258211
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java ---
    @@ -302,8 +311,19 @@ public boolean isExplicitSorted() {
     
       @Override
       public int compareTo(int rowId, byte[] compareValue) {
    -    byte[] chunkData = this.getChunkData((int) rowId);
    -    return ByteUtil.UnsafeComparer.INSTANCE.compareTo(chunkData, compareValue);
    +    // rowId is the inverted index, but the null bitset is based on actual data
    +    int nullBitSetRowId = rowId;
    +    if (isExplicitSorted()) {
    +      nullBitSetRowId = getInvertedReverseIndex(rowId);
    +    }
    +    byte[] nullBitSet = getNullBitSet(nullBitSetRowId, columnPage.getColumnSpec().getColumnType());
    --- End diff --
   
    why we need to handle inverted index for nullbitset, i think it is not required


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2784: [CARBONDATA-2987] Data mismatch after compact...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2784#discussion_r221261712
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java ---
    @@ -302,8 +311,19 @@ public boolean isExplicitSorted() {
     
       @Override
       public int compareTo(int rowId, byte[] compareValue) {
    -    byte[] chunkData = this.getChunkData((int) rowId);
    -    return ByteUtil.UnsafeComparer.INSTANCE.compareTo(chunkData, compareValue);
    +    // rowId is the inverted index, but the null bitset is based on actual data
    +    int nullBitSetRowId = rowId;
    +    if (isExplicitSorted()) {
    +      nullBitSetRowId = getInvertedReverseIndex(rowId);
    +    }
    +    byte[] nullBitSet = getNullBitSet(nullBitSetRowId, columnPage.getColumnSpec().getColumnType());
    --- End diff --
   
    because this rowId is not the original rowId. Hence it is required.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2784
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8899/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2784
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/640/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2784
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/832/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2784
 
    @ravipesala : PR is ready please check


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2784
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2784
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/673/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2784
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/869/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2784
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8937/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2784: [CARBONDATA-2987] Data mismatch after compact...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2784


---