shunlean opened a new pull request #3847: URL: https://github.com/apache/carbondata/pull/3847 ### Why is this PR needed? Only after sorting temp, the write(sortTemp file) operation can run. For better performance, we want to do the writeDataToFile and SortDataRows operations in parallel. ### What changes were proposed in this PR? In (Unsafe)SortDataRows, we add new threads to run write the file operation. About 10% time is reduced with parallel operation in one case. ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA1 commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659300018 Can one of the admins verify this patch? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659307713 Add to whitelist ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659307892 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659309836 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3402/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659311646 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1661/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Zhangshunyu commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659810429 please check the build failure info ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Zhangshunyu commented on a change in pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#discussion_r456193818 ########## File path: processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java ########## @@ -200,25 +203,44 @@ public void startSorting() { * @param file file * @throws CarbonSortKeyAndGroupByException */ - private void writeDataToFile(UnsafeCarbonRowPage rowPage, File file) - throws CarbonSortKeyAndGroupByException { - DataOutputStream stream = null; - try { - // open stream - stream = FileFactory.getDataOutputStream(file.getPath(), - parameters.getFileWriteBufferSize(), parameters.getSortTempCompressorName()); - int actualSize = rowPage.getBuffer().getActualSize(); - // write number of entries to the file - stream.writeInt(actualSize); - for (int i = 0; i < actualSize; i++) { - rowPage.writeRow( - rowPage.getBuffer().get(i) + rowPage.getDataBlock().getBaseOffset(), stream); + private void writeDataToFile(UnsafeCarbonRowPage rowPage, File file) { + writeService.submit(new WriteThread(rowPage, file)); + } + + public class WriteThread implements Runnable { + private File file; + private UnsafeCarbonRowPage rowPage; + + public WriteThread(UnsafeCarbonRowPage rowPage, File file) { + this.rowPage = rowPage; + this.file = file; + + } + + @Override + public void run() { + DataOutputStream stream = null; + try { + // open stream + stream = FileFactory.getDataOutputStream(this.file.getPath(), + parameters.getFileWriteBufferSize(), parameters.getSortTempCompressorName()); + int actualSize = rowPage.getBuffer().getActualSize(); + // write number of entries to the file + stream.writeInt(actualSize); + for (int i = 0; i < actualSize; i++) { + rowPage.writeRow( + rowPage.getBuffer().get(i) + rowPage.getDataBlock().getBaseOffset(), stream); + } + // add sort temp filename to and arrayList. When the list size reaches 20 then + // intermediate merging of sort temp files will be triggered + unsafeInMemoryIntermediateFileMerger.addFileToMerge(file); + } catch (IOException | MemoryException e) { + e.printStackTrace(); Review comment: use log4j instead of printStackStrace ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Zhangshunyu commented on a change in pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#discussion_r456193999 ########## File path: processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/SortParameters.java ########## @@ -37,6 +40,13 @@ import org.apache.log4j.Logger; public class SortParameters implements Serializable { + + private ExecutorService writeService = Executors.newFixedThreadPool(5, Review comment: Suggest to make it configurable when set core pool size for threadpool ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
shunlean commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-660848771 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-660857142 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1693/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-660858429 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3435/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
shunlean commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-660882567 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-660909683 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1695/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-660910195 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3437/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-661617787 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3447/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-661618161 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1705/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-662324632 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1718/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-662325092 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3460/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
shunlean commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-662356431 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |