kevinjmh opened a new pull request #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603 ### Why is this PR needed? In local sort, multi-threads is used for each partition but adding rows to a same object with lock. Only after that, sort and write operations run. For better performance, we want to do the sort and write(sortTemp file) operations in parallel. ### What changes were proposed in this PR? remov object lock when adding rows to UnsafeSortDataRows. keep object lock in UnsafeIntermediateMerger to collect results of all threads. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#issuecomment-582212686 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/152/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#issuecomment-582224187 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1855/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#issuecomment-582278617 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/154/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#issuecomment-582296656 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1857/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
kevinjmh closed pull request #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
kevinjmh opened a new pull request #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#issuecomment-584120774 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/213/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#issuecomment-584147862 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1915/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#discussion_r377424844 ########## File path: processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/SortIntermediateFileMerger.java ########## @@ -101,6 +87,10 @@ private void startIntermediateMerging(File[] intermediateFiles) { + '_' + parameters.getRangeId() + '_' + System.nanoTime() + CarbonCommonConstants.MERGERD_EXTENSION); IntermediateFileMerger merger = new IntermediateFileMerger(parameters, intermediateFiles, file); + if (LOGGER.isDebugEnabled()) { + LOGGER.debug("Sumitting request for intermediate merging no of files: " Review comment: *submitting *change "no" to number of files ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#issuecomment-584457380 @kevinjmh : Good finding. Now each thread can work on its own sortDataRows, which will improve the intermediate sorting time. But the final merge sorting may slowdown as data is more unsorted compared to previous code. However as you say overall sorting time may improve. can we measure for both unsafe and safe and add a table ? If possible measure total, intermediate and final sort time for both. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
kevinjmh commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#issuecomment-584460957 > @kevinjmh : Good finding. > > Now each thread can work on its own sortDataRows, which will improve the intermediate sorting time. But the final merge sorting may slowdown as data is more unsorted compared to previous code. However as you say overall sorting time may improve. > > can we measure for both unsafe and safe and add a table ? If possible measure total, intermediate and final sort time for both. I want to mention that in this PR multi-threads use same IntermediateMerger, so it is still controlled by setting `carbon.sort.intermediate.files.limit` as before, which default to 20. So, final sort is not affected. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#discussion_r377443375 ########## File path: processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeParallelReadMergeSorterImpl.java ########## @@ -80,29 +80,25 @@ public void initialize(SortParameters sortParameters) { public Iterator<CarbonRowBatch>[] sort(Iterator<CarbonRowBatch>[] iterators) throws CarbonDataLoadingException { int inMemoryChunkSizeInMB = CarbonProperties.getInstance().getSortMemoryChunkSizeInMB(); - UnsafeSortDataRows sortDataRow = - new UnsafeSortDataRows(sortParameters, unsafeIntermediateFileMerger, inMemoryChunkSizeInMB); + UnsafeSortDataRows[] sortDataRows = new UnsafeSortDataRows[iterators.length]; Review comment: Have you observed the memory foot print with new changes ? Now as each thread uses it's own unsafeSortDataRows, each thread will allocate default 64 MB or configured value. So, local sort needs more memory after this changes ! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
kevinjmh commented on a change in pull request #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#discussion_r377460855 ########## File path: processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeParallelReadMergeSorterImpl.java ########## @@ -80,29 +80,25 @@ public void initialize(SortParameters sortParameters) { public Iterator<CarbonRowBatch>[] sort(Iterator<CarbonRowBatch>[] iterators) throws CarbonDataLoadingException { int inMemoryChunkSizeInMB = CarbonProperties.getInstance().getSortMemoryChunkSizeInMB(); - UnsafeSortDataRows sortDataRow = - new UnsafeSortDataRows(sortParameters, unsafeIntermediateFileMerger, inMemoryChunkSizeInMB); + UnsafeSortDataRows[] sortDataRows = new UnsafeSortDataRows[iterators.length]; Review comment: Isn't it worth it? If afraid of too many threads, we can limit the thread pool size and allocate the memory when it can run. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#discussion_r378832145 ########## File path: processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/SortIntermediateFileMerger.java ########## @@ -72,20 +67,11 @@ public void addFileToMerge(File sortTempFile) { // intermediate merging of sort temp files will be triggered synchronized (lockObject) { procFiles.add(sortTempFile); - } - } - - public void startMergingIfPossible() { - File[] fileList; - if (procFiles.size() >= parameters.getNumberOfIntermediateFileToBeMerged()) { - synchronized (lockObject) { - fileList = procFiles.toArray(new File[procFiles.size()]); - this.procFiles = new ArrayList<File>(); + if (procFiles.size() >= parameters.getNumberOfIntermediateFileToBeMerged()) { + File[] fileList = procFiles.toArray(new File[procFiles.size()]); + this.procFiles = new ArrayList<>(); + startIntermediateMerging(fileList); Review comment: directly can pass (File[]) procFiles.toArray() ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#discussion_r378824300 ########## File path: processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/SortDataRows.java ########## @@ -133,62 +94,74 @@ public void addRow(Object[] row) throws CarbonSortKeyAndGroupByException { if (LOGGER.isDebugEnabled()) { LOGGER.debug("************ Writing to temp file ********** "); } - intermediateFileMerger.startMergingIfPossible(); Object[][] recordHolderListLocal = recordHolderList; - try { - semaphore.acquire(); - dataSorterAndWriterExecutorService.execute(new DataSorterAndWriter(recordHolderListLocal)); - } catch (InterruptedException e) { - LOGGER.error("exception occurred while trying to acquire a semaphore lock: ", e); - throw new CarbonSortKeyAndGroupByException(e); - } + handlePreviousPage(recordHolderListLocal); // create the new holder Array this.recordHolderList = new Object[this.sortBufferSize][]; this.entryCount = 0; } recordHolderList[entryCount++] = row; } - /** Review comment: i suggest not to remove comments and better to improve them if they aren't giving any meaningful info. Please update the comment instead of removing in all places ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#discussion_r378832453 ########## File path: processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/SortDataRows.java ########## @@ -133,62 +94,74 @@ public void addRow(Object[] row) throws CarbonSortKeyAndGroupByException { if (LOGGER.isDebugEnabled()) { LOGGER.debug("************ Writing to temp file ********** "); } - intermediateFileMerger.startMergingIfPossible(); Object[][] recordHolderListLocal = recordHolderList; - try { - semaphore.acquire(); - dataSorterAndWriterExecutorService.execute(new DataSorterAndWriter(recordHolderListLocal)); - } catch (InterruptedException e) { - LOGGER.error("exception occurred while trying to acquire a semaphore lock: ", e); - throw new CarbonSortKeyAndGroupByException(e); - } + handlePreviousPage(recordHolderListLocal); // create the new holder Array this.recordHolderList = new Object[this.sortBufferSize][]; this.entryCount = 0; } recordHolderList[entryCount++] = row; } - /** - * This method will be used to add new row - * - * @param rowBatch new rowBatch - * @throws CarbonSortKeyAndGroupByException problem while writing - */ public void addRowBatch(Object[][] rowBatch, int size) throws CarbonSortKeyAndGroupByException { // if record holder list size is equal to sort buffer size then it will // sort the list and then write current list data to file - synchronized (addRowsLock) { - int sizeLeft = 0; - if (entryCount + size >= sortBufferSize) { - if (LOGGER.isDebugEnabled()) { - LOGGER.debug("************ Writing to temp file ********** "); - } - intermediateFileMerger.startMergingIfPossible(); - Object[][] recordHolderListLocal = recordHolderList; - sizeLeft = sortBufferSize - entryCount; - if (sizeLeft > 0) { - System.arraycopy(rowBatch, 0, recordHolderListLocal, entryCount, sizeLeft); - } - try { - semaphore.acquire(); - dataSorterAndWriterExecutorService - .execute(new DataSorterAndWriter(recordHolderListLocal)); - } catch (Exception e) { - LOGGER.error( - "exception occurred while trying to acquire a semaphore lock: " + e.getMessage(), e); - throw new CarbonSortKeyAndGroupByException(e); - } - // create the new holder Array - this.recordHolderList = new Object[this.sortBufferSize][]; - this.entryCount = 0; - size = size - sizeLeft; - if (size == 0) { - return; - } + int sizeLeft = 0; + if (entryCount + size >= sortBufferSize) { + if (LOGGER.isDebugEnabled()) { + LOGGER.debug("************ Writing to temp file ********** "); } - System.arraycopy(rowBatch, sizeLeft, recordHolderList, entryCount, size); - entryCount += size; + Object[][] recordHolderListLocal = recordHolderList; + sizeLeft = sortBufferSize - entryCount; + if (sizeLeft > 0) { + System.arraycopy(rowBatch, 0, recordHolderListLocal, entryCount, sizeLeft); + } + handlePreviousPage(recordHolderListLocal); + // create the new holder Array + this.recordHolderList = new Object[this.sortBufferSize][]; + this.entryCount = 0; + size = size - sizeLeft; + if (size == 0) { + return; + } + } + System.arraycopy(rowBatch, sizeLeft, recordHolderList, entryCount, size); + entryCount += size; + } + + /** + * sort and write data + * @param recordHolderArray + */ + private void handlePreviousPage(Object[][] recordHolderArray) + throws CarbonSortKeyAndGroupByException { + try { + long startTime = System.currentTimeMillis(); + if (parameters.getNumberOfNoDictSortColumns() > 0) { + Arrays.sort(recordHolderArray, + new NewRowComparator(parameters.getNoDictionarySortColumn(), + parameters.getNoDictDataType())); + } else { + Arrays.sort(recordHolderArray, + new NewRowComparatorForNormalDims(parameters.getNumberOfSortColumns())); + } + + // create a new file and choose folder randomly every time + String[] tmpFileLocation = parameters.getTempFileLocation(); + String locationChosen = tmpFileLocation[new Random().nextInt(tmpFileLocation.length)]; + File sortTempFile = new File( + locationChosen + File.separator + parameters.getTableName() + + '_' + parameters.getRangeId() + '_' + System.nanoTime() Review comment: do not hardcode, use underscore constant from CarbonCommonConstant ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#discussion_r378823627 ########## File path: processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/SortDataRows.java ########## @@ -133,62 +94,74 @@ public void addRow(Object[] row) throws CarbonSortKeyAndGroupByException { if (LOGGER.isDebugEnabled()) { LOGGER.debug("************ Writing to temp file ********** "); } - intermediateFileMerger.startMergingIfPossible(); Object[][] recordHolderListLocal = recordHolderList; - try { - semaphore.acquire(); - dataSorterAndWriterExecutorService.execute(new DataSorterAndWriter(recordHolderListLocal)); - } catch (InterruptedException e) { - LOGGER.error("exception occurred while trying to acquire a semaphore lock: ", e); - throw new CarbonSortKeyAndGroupByException(e); - } + handlePreviousPage(recordHolderListLocal); // create the new holder Array this.recordHolderList = new Object[this.sortBufferSize][]; this.entryCount = 0; } recordHolderList[entryCount++] = row; } - /** - * This method will be used to add new row - * - * @param rowBatch new rowBatch - * @throws CarbonSortKeyAndGroupByException problem while writing - */ public void addRowBatch(Object[][] rowBatch, int size) throws CarbonSortKeyAndGroupByException { // if record holder list size is equal to sort buffer size then it will // sort the list and then write current list data to file - synchronized (addRowsLock) { - int sizeLeft = 0; - if (entryCount + size >= sortBufferSize) { - if (LOGGER.isDebugEnabled()) { - LOGGER.debug("************ Writing to temp file ********** "); - } - intermediateFileMerger.startMergingIfPossible(); - Object[][] recordHolderListLocal = recordHolderList; - sizeLeft = sortBufferSize - entryCount; - if (sizeLeft > 0) { - System.arraycopy(rowBatch, 0, recordHolderListLocal, entryCount, sizeLeft); - } - try { - semaphore.acquire(); - dataSorterAndWriterExecutorService - .execute(new DataSorterAndWriter(recordHolderListLocal)); - } catch (Exception e) { - LOGGER.error( - "exception occurred while trying to acquire a semaphore lock: " + e.getMessage(), e); - throw new CarbonSortKeyAndGroupByException(e); - } - // create the new holder Array - this.recordHolderList = new Object[this.sortBufferSize][]; - this.entryCount = 0; - size = size - sizeLeft; - if (size == 0) { - return; - } + int sizeLeft = 0; + if (entryCount + size >= sortBufferSize) { Review comment: i think the size parameter is confusing, better rename it, because its an index passed right, we can may be name it as sortDataRowIndex ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#discussion_r378871132 ########## File path: processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/ParallelReadMergeSorterImpl.java ########## @@ -154,11 +155,13 @@ public void close() { /** * Below method will be used to process data to next step */ - private boolean processRowToNextStep(SortDataRows sortDataRows, SortParameters parameters) + private boolean processRowToNextStep(SortDataRows[] sortDataRows, SortParameters parameters) Review comment: i think there duplicate code fragments in unsafe and safe, please check and better to refactor. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3603: [CARBONDATA-3679] Optimize local sort performance
URL: https://github.com/apache/carbondata/pull/3603#discussion_r378832595 ########## File path: processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/SortDataRows.java ########## @@ -133,62 +94,74 @@ public void addRow(Object[] row) throws CarbonSortKeyAndGroupByException { if (LOGGER.isDebugEnabled()) { LOGGER.debug("************ Writing to temp file ********** "); } - intermediateFileMerger.startMergingIfPossible(); Object[][] recordHolderListLocal = recordHolderList; - try { - semaphore.acquire(); - dataSorterAndWriterExecutorService.execute(new DataSorterAndWriter(recordHolderListLocal)); - } catch (InterruptedException e) { - LOGGER.error("exception occurred while trying to acquire a semaphore lock: ", e); - throw new CarbonSortKeyAndGroupByException(e); - } + handlePreviousPage(recordHolderListLocal); // create the new holder Array this.recordHolderList = new Object[this.sortBufferSize][]; this.entryCount = 0; } recordHolderList[entryCount++] = row; } - /** - * This method will be used to add new row - * - * @param rowBatch new rowBatch - * @throws CarbonSortKeyAndGroupByException problem while writing - */ public void addRowBatch(Object[][] rowBatch, int size) throws CarbonSortKeyAndGroupByException { // if record holder list size is equal to sort buffer size then it will // sort the list and then write current list data to file - synchronized (addRowsLock) { - int sizeLeft = 0; - if (entryCount + size >= sortBufferSize) { - if (LOGGER.isDebugEnabled()) { - LOGGER.debug("************ Writing to temp file ********** "); - } - intermediateFileMerger.startMergingIfPossible(); - Object[][] recordHolderListLocal = recordHolderList; - sizeLeft = sortBufferSize - entryCount; - if (sizeLeft > 0) { - System.arraycopy(rowBatch, 0, recordHolderListLocal, entryCount, sizeLeft); - } - try { - semaphore.acquire(); - dataSorterAndWriterExecutorService - .execute(new DataSorterAndWriter(recordHolderListLocal)); - } catch (Exception e) { - LOGGER.error( - "exception occurred while trying to acquire a semaphore lock: " + e.getMessage(), e); - throw new CarbonSortKeyAndGroupByException(e); - } - // create the new holder Array - this.recordHolderList = new Object[this.sortBufferSize][]; - this.entryCount = 0; - size = size - sizeLeft; - if (size == 0) { - return; - } + int sizeLeft = 0; + if (entryCount + size >= sortBufferSize) { + if (LOGGER.isDebugEnabled()) { + LOGGER.debug("************ Writing to temp file ********** "); } - System.arraycopy(rowBatch, sizeLeft, recordHolderList, entryCount, size); - entryCount += size; + Object[][] recordHolderListLocal = recordHolderList; + sizeLeft = sortBufferSize - entryCount; + if (sizeLeft > 0) { + System.arraycopy(rowBatch, 0, recordHolderListLocal, entryCount, sizeLeft); + } + handlePreviousPage(recordHolderListLocal); + // create the new holder Array + this.recordHolderList = new Object[this.sortBufferSize][]; + this.entryCount = 0; + size = size - sizeLeft; + if (size == 0) { + return; + } + } + System.arraycopy(rowBatch, sizeLeft, recordHolderList, entryCount, size); + entryCount += size; + } + + /** + * sort and write data + * @param recordHolderArray + */ + private void handlePreviousPage(Object[][] recordHolderArray) + throws CarbonSortKeyAndGroupByException { + try { + long startTime = System.currentTimeMillis(); + if (parameters.getNumberOfNoDictSortColumns() > 0) { + Arrays.sort(recordHolderArray, + new NewRowComparator(parameters.getNoDictionarySortColumn(), + parameters.getNoDictDataType())); + } else { + Arrays.sort(recordHolderArray, + new NewRowComparatorForNormalDims(parameters.getNumberOfSortColumns())); + } + + // create a new file and choose folder randomly every time + String[] tmpFileLocation = parameters.getTempFileLocation(); + String locationChosen = tmpFileLocation[new Random().nextInt(tmpFileLocation.length)]; + File sortTempFile = new File( + locationChosen + File.separator + parameters.getTableName() + + '_' + parameters.getRangeId() + '_' + System.nanoTime() + + CarbonCommonConstants.SORT_TEMP_FILE_EXT); + writeDataToFile(recordHolderArray, recordHolderArray.length, sortTempFile); + // add sort temp filename to and arrayList. When the list size reaches 20 then Review comment: correct the comment ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |