[GitHub] carbondata pull request #2664: [CARBONDATA-2895] Fix Query result mismatch w...

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2664: [CARBONDATA-2895] Fix Query result mismatch w...

qiuchenjian-2
GitHub user ajantha-bhat opened a pull request:

    https://github.com/apache/carbondata/pull/2664

    [CARBONDATA-2895] Fix Query result mismatch with Batch-sort in save to disk (sort temp files) scenario.

    **probelm:** Query result mismatch with Batch-sort in save to disk (sort temp files) scenario.
   
    **scenario:**
    a) Configure batchsort but give batch size more than UnsafeMemoryManager.INSTANCE.getUsableMemory().
    b) Load data that is greater than batch size. Observe that unsafeMemoryManager save to disk happened as it cannot process one batch.  
    c) so load happens in 2 batch.
    d) When query the results. There result data rows is more than expected data rows.
   
   
    **root cause:**
    For each batch, createSortDataRows() will be called.
    Files saved to disk during sorting of previous batch was considered for this batch.
   
   
    **solution:**
    Files saved to disk during sorting of previous batch ,should not be considered for this batch.
    Hence use batchID as rangeID field of sorttempfiles.
    So getFilesToMergeSort() will select files of only this batch.
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed? NA
     
     - [ ] Any backward compatibility impacted? NA
     
     - [ ] Document update required? NA
   
     - [ ] Testing done. done
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.  NA
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajantha-bhat/carbondata master_new

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2664.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2664
   
----
commit bad70a37508a2bad650aae2b150eecef59449a30
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-08-27T15:25:03Z

    [CARBONDATA-2895] Fix Query result mismatch with Batch-sort in save to disk (sort temp files) scenario.
   
    probelm: Query result mismatch with Batch-sort in save to disk (sort
    temp files) scenario.
   
    scenario:
    a) Configure batchsort but give batch size more than
    UnsafeMemoryManager.INSTANCE.getUsableMemory().
    b) Load data that is greater than batch size. Observe that
    unsafeMemoryManager save to disk happened as it cannot process one
    batch.
    c) so load happens in 2 batch.
    d) When query the results. There result data rows is more than expected
    data rows.
   
    root cause:
   
    For each batch, createSortDataRows() will be called.
    Files saved to disk during sorting of previous batch was considered for
    this batch.
   
    solution:
    Files saved to disk during sorting of previous batch ,should not be
    considered for this batch.
    Hence use batchID as rangeID field of sorttempfiles.
    So getFilesToMergeSort() will select files of only this batch.

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    @jackylk , @ravipesala : Please review


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6428/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8095/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/33/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2664: [CARBONDATA-2895] Fix Query result mismatch w...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2664#discussion_r213575941
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java ---
    @@ -62,12 +62,15 @@
     
       private AtomicLong rowCounter;
     
    +  private AtomicInteger batchId;
    --- End diff --
   
    Please add comment for this variable


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2664: [CARBONDATA-2895] Fix Query result mismatch w...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2664#discussion_r213576004
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java ---
    @@ -62,12 +62,15 @@
     
       private AtomicLong rowCounter;
     
    +  private AtomicInteger batchId;
    +
    --- End diff --
   
    please add test case to verify the query result


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2664: [CARBONDATA-2895] Fix Query result mismatch w...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2664#discussion_r213593256
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java ---
    @@ -62,12 +62,15 @@
     
       private AtomicLong rowCounter;
     
    +  private AtomicInteger batchId;
    +
    --- End diff --
   
    This issue scenario happens only in spill to disk scenario.
    spill to disk happens only when batch size is more than CCC.IN_MEMORY_STORAGE_FOR_SORTED_DATA_IN_MB.
   
    If I set this property lesser number. It will be overwritten by CarbonProperties.validateSortMemorySizeInMB() .
   
    so only way to reproduce issue is with huge data. .But adding a testcase for huge data will slowdown PR builder.
   
    Hence not added.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2664: [CARBONDATA-2895] Fix Query result mismatch w...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2664#discussion_r213594537
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java ---
    @@ -62,12 +62,15 @@
     
       private AtomicLong rowCounter;
     
    +  private AtomicInteger batchId;
    --- End diff --
   
    done.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6453/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/69/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8140/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/84/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8155/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    Hi @ajantha-bhat , after this change, have you tried data loading with sort_column_bounds and batch_sort? Does it work fine?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    @xuchuanyin : Sort_column_bounds is used only for localsort. This configuration is not used for Batch_sort. So I can fill batch ID in range field.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    @ajantha-bhat I could't understand what is mismatch, query result for table loading using batch_sort and query result for table loading using local_sort is not matched? If so, please modify the PR title accordingly


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2664: [CARBONDATA-2895] Fix Query result mismatch with Bat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2664
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8176/



---
12