Ajantha Bhat created CARBONDATA-2895:
----------------------------------------
Summary: [Batch-sort]Query result mismatch with Batch-sort in save to disk (sort temp files) scenario.
Key: CARBONDATA-2895
URL:
https://issues.apache.org/jira/browse/CARBONDATA-2895 Project: CarbonData
Issue Type: Bug
Reporter: Ajantha Bhat
Assignee: Ajantha Bhat
probelm: Query result mismatch with Batch-sort in save to disk (sort temp files) scenario.
scenario:
a) Configure batchsort but give batch size more than UnsafeMemoryManager.INSTANCE.getUsableMemory().
b) Load data that is greater than batch size. Observe that unsafeMemoryManager save to disk happened as it cannot process one batch.
c) so load happens in 2 batch.
d) When query the results. There result data rows is more than expected data rows.
root cause:
For each batch, createSortDataRows() will be called.
Files saved to disk during sorting of previous batch was considered for this batch.
solution:
Files saved to disk during sorting of previous batch ,should not be considered for this batch.
Hence use batchID as rangeID field of sorttempfiles.
So getFilesToMergeSort() will select files of only this batch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)