[jira] [Created] (CARBONDATA-2895) [Batch-sort]Query result mismatch with Batch-sort in save to disk (sort temp files) scenario.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (CARBONDATA-2895) [Batch-sort]Query result mismatch with Batch-sort in save to disk (sort temp files) scenario.

Akash R Nilugal (Jira)
Ajantha Bhat created CARBONDATA-2895:
----------------------------------------

             Summary: [Batch-sort]Query result mismatch with Batch-sort in save to disk (sort temp files) scenario.
                 Key: CARBONDATA-2895
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2895
             Project: CarbonData
          Issue Type: Bug
            Reporter: Ajantha Bhat
            Assignee: Ajantha Bhat


probelm: Query result mismatch with Batch-sort in save to disk (sort temp files) scenario.

scenario:
a) Configure batchsort but give batch size more than UnsafeMemoryManager.INSTANCE.getUsableMemory().
b) Load data that is greater than batch size. Observe that unsafeMemoryManager save to disk happened as it cannot process one batch.  
c) so load happens in 2 batch.
d) When query the results. There result data rows is more than expected data rows.


root cause:

For each batch, createSortDataRows() will be called.
Files saved to disk during sorting of previous batch was considered for this batch.


solution:
Files saved to disk during sorting of previous batch ,should not be considered for this batch.
Hence use batchID as rangeID field of sorttempfiles.
So getFilesToMergeSort() will select files of only this batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)