maheshrajus opened a new pull request #3912: URL: https://github.com/apache/carbondata/pull/3912 ### Why is this PR needed? [WIP] Global sort partitions should be determined dynamically ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-687139428 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2235/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-687140085 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3975/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-688912501 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4004/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-688914915 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2264/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-689101595 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4012/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-689105224 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2274/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
maheshrajus commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-705339174 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-705381449 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4333/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-705383450 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2583/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-705381449 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
maheshrajus commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-705339174 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#discussion_r503160924 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ########## @@ -227,9 +235,19 @@ object DataLoadProcessBuilderOnSpark { // 2. sort var numPartitions = CarbonDataProcessorUtil.getGlobalSortPartitions( configuration.getDataLoadProperty(CarbonCommonConstants.LOAD_GLOBAL_SORT_PARTITIONS)) - if (numPartitions <= 0) { + + // if numPartitions user does not specify then dynamically calculate + if (numPartitions == 0) { + // get the size in bytes and convert to size in MB + val sizeOfDataFrame = SizeEstimator.estimate(rdd)/1000000 + // data frame size can not be more than Int size + numPartitions = sizeOfDataFrame.toInt/rdd.getNumPartitions Review comment: @maheshrajus can you please explain the logic used to determine the partitions in the comment. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on a change in pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#discussion_r510558896 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ########## @@ -143,10 +143,18 @@ object DataLoadProcessBuilderOnSpark { var numPartitions = CarbonDataProcessorUtil.getGlobalSortPartitions( configuration.getDataLoadProperty(CarbonCommonConstants.LOAD_GLOBAL_SORT_PARTITIONS)) - if (numPartitions <= 0) { - numPartitions = convertRDD.partitions.length + + // if numPartitions user does not specify and not specified in config then dynamically calculate + if (numPartitions == 0) { + // get the size in bytes and convert to size in MB + val sizeOfDataFrame = SizeEstimator.estimate(inputRDD)/1000000 + // data frame size can not be more than Int size + numPartitions = sizeOfDataFrame.toInt/inputRDD.getNumPartitions Review comment: it should be numberOfParitions = totalSize/partitionSize ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-716994724 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4696/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-716994984 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2939/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-717141636 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2947/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-717143281 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4704/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#discussion_r503187128 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ########## @@ -143,10 +143,18 @@ object DataLoadProcessBuilderOnSpark { var numPartitions = CarbonDataProcessorUtil.getGlobalSortPartitions( configuration.getDataLoadProperty(CarbonCommonConstants.LOAD_GLOBAL_SORT_PARTITIONS)) - if (numPartitions <= 0) { - numPartitions = convertRDD.partitions.length + + // if numPartitions user does not specify and not specified in config then dynamically calculate + if (numPartitions == 0) { + // get the size in bytes and convert to size in MB + val sizeOfDataFrame = SizeEstimator.estimate(inputRDD)/1000000 + // data frame size can not be more than Int size + numPartitions = sizeOfDataFrame.toInt/inputRDD.getNumPartitions Review comment: i think here, you should try to get the partitions based on the total load size and the partition size and not the partition number.Please check again and handle correctly. @QiangCai please have a look once ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
maheshrajus commented on a change in pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#discussion_r512579906 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ########## @@ -143,10 +143,18 @@ object DataLoadProcessBuilderOnSpark { var numPartitions = CarbonDataProcessorUtil.getGlobalSortPartitions( configuration.getDataLoadProperty(CarbonCommonConstants.LOAD_GLOBAL_SORT_PARTITIONS)) - if (numPartitions <= 0) { - numPartitions = convertRDD.partitions.length + + // if numPartitions user does not specify and not specified in config then dynamically calculate + if (numPartitions == 0) { + // get the size in bytes and convert to size in MB + val sizeOfDataFrame = SizeEstimator.estimate(inputRDD)/1000000 + // data frame size can not be more than Int size + numPartitions = sizeOfDataFrame.toInt/inputRDD.getNumPartitions Review comment: @akashrn5 you reviewed old code, now i pushed again the new code and testing is in progress ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |