marchpure opened a new pull request #3620: [CARBONDATA-3700] Optimize prune performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620 …-threads Why is this PR needed? When pruning with multi-threads, there is a bug hambers the prunning performance heavily. When the pruning results in no blocklets to map the query filter, The getExtendblocklet function will be triggered to get the extend blocklet metadata, when the Input of this function is an empty blocklet list, this function is expected to return an empty extendblocklet list directyly , but now there is a bug leading to "a hashset add operation" overhead which is meaningless. Meanwhile, When pruning with multi-threads, the getExtendblocklet function will be triggerd for each blocklet, which should be avoided by triggerring this function for each segment. What changes were proposed in this PR? 1) if the input is an empty blocklet list in the getExtendblocklet function, we return an empty extendblocklet list directyly 2) We trigger the getExtendblocklet functon for each segment instead of each blocklet. Does this PR introduce any user interface change? No. Is any new testcase added? Yes. ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586338942 Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/293/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586339554 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1997/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
marchpure commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586341516 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586351917 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/294/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586385984 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1998/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r379987630 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ########## @@ -138,8 +138,12 @@ public CarbonTable getTable() { } } int numOfThreadsForPruning = CarbonProperties.getNumOfThreadsForPruning(); + int carbonDriverPruningMultiThreadEnableFilesCount = + Integer.parseInt(CarbonProperties.getInstance().getProperty( + CarbonCommonConstants.CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT, Review comment: Need update the document for the new property added ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r379987838 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ########## @@ -138,8 +138,12 @@ public CarbonTable getTable() { } } int numOfThreadsForPruning = CarbonProperties.getNumOfThreadsForPruning(); + int carbonDriverPruningMultiThreadEnableFilesCount = Review comment: Need to add validation for carbon property, if someone configures negative value. Need to use the default value ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r379988045 ########## File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/blockprune/BlockPruneQueryTestCase.scala ########## @@ -18,16 +18,30 @@ package org.apache.carbondata.spark.testsuite.blockprune import java.io.DataOutputStream +import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.spark.sql.Row import org.scalatest.BeforeAndAfterAll import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties import org.apache.spark.sql.test.util.QueryTest /** * This class contains test cases for block prune query */ class BlockPruneQueryTestCase extends QueryTest with BeforeAndAfterAll { val outputPath = s"$resourcesPath/block_prune_test.csv" + val MULTI_THREAD_ENABLE_FILES_COUNT = "1"; Review comment: use small case for variable names ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r379988045 ########## File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/blockprune/BlockPruneQueryTestCase.scala ########## @@ -18,16 +18,30 @@ package org.apache.carbondata.spark.testsuite.blockprune import java.io.DataOutputStream +import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.spark.sql.Row import org.scalatest.BeforeAndAfterAll import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties import org.apache.spark.sql.test.util.QueryTest /** * This class contains test cases for block prune query */ class BlockPruneQueryTestCase extends QueryTest with BeforeAndAfterAll { val outputPath = s"$resourcesPath/block_prune_test.csv" + val MULTI_THREAD_ENABLE_FILES_COUNT = "1"; Review comment: use camel case for variable names ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r379988354 ########## File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/blockprune/BlockPruneQueryTestCase.scala ########## @@ -18,16 +18,30 @@ package org.apache.carbondata.spark.testsuite.blockprune import java.io.DataOutputStream +import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.spark.sql.Row import org.scalatest.BeforeAndAfterAll import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties import org.apache.spark.sql.test.util.QueryTest /** * This class contains test cases for block prune query */ class BlockPruneQueryTestCase extends QueryTest with BeforeAndAfterAll { val outputPath = s"$resourcesPath/block_prune_test.csv" + val MULTI_THREAD_ENABLE_FILES_COUNT = "1"; Review comment: Still it won't prune multi-thread as other conditions may not satisfy ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r379988354 ########## File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/blockprune/BlockPruneQueryTestCase.scala ########## @@ -18,16 +18,30 @@ package org.apache.carbondata.spark.testsuite.blockprune import java.io.DataOutputStream +import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.spark.sql.Row import org.scalatest.BeforeAndAfterAll import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties import org.apache.spark.sql.test.util.QueryTest /** * This class contains test cases for block prune query */ class BlockPruneQueryTestCase extends QueryTest with BeforeAndAfterAll { val outputPath = s"$resourcesPath/block_prune_test.csv" + val MULTI_THREAD_ENABLE_FILES_COUNT = "1"; Review comment: Still it won't prune multi-thread as other conditions may not satisfy ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586821309 good finding, it can avoid unnecessary creation of `TableBlockIndexUniqueIdentifierWrapper` if the pruned blocklet is zero size. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r382530139 ########## File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/blockprune/BlockPruneQueryTestCase.scala ########## @@ -18,16 +18,30 @@ package org.apache.carbondata.spark.testsuite.blockprune import java.io.DataOutputStream +import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.spark.sql.Row import org.scalatest.BeforeAndAfterAll import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties import org.apache.spark.sql.test.util.QueryTest /** * This class contains test cases for block prune query */ class BlockPruneQueryTestCase extends QueryTest with BeforeAndAfterAll { val outputPath = s"$resourcesPath/block_prune_test.csv" + val MULTI_THREAD_ENABLE_FILES_COUNT = "1"; + val MULTI_THREAD_DISABLE_FILES_COUNT + = CarbonCommonConstants.CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT_DEFAULT; + + def perpareCarbonProperty(propertyName:String, + propertyValue:String): Unit ={ + val properties = CarbonProperties.getInstance() + properties.removeProperty(propertyName) Review comment: removeProperty may not be required, as addProperty in next line will update the key with new value if already present ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
marchpure commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r384321352 ########## File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/blockprune/BlockPruneQueryTestCase.scala ########## @@ -18,16 +18,30 @@ package org.apache.carbondata.spark.testsuite.blockprune import java.io.DataOutputStream +import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.spark.sql.Row import org.scalatest.BeforeAndAfterAll import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties import org.apache.spark.sql.test.util.QueryTest /** * This class contains test cases for block prune query */ class BlockPruneQueryTestCase extends QueryTest with BeforeAndAfterAll { val outputPath = s"$resourcesPath/block_prune_test.csv" + val MULTI_THREAD_ENABLE_FILES_COUNT = "1"; + val MULTI_THREAD_DISABLE_FILES_COUNT + = CarbonCommonConstants.CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT_DEFAULT; + + def perpareCarbonProperty(propertyName:String, + propertyValue:String): Unit ={ + val properties = CarbonProperties.getInstance() + properties.removeProperty(propertyName) Review comment: resolved ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
marchpure commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r384321402 ########## File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/blockprune/BlockPruneQueryTestCase.scala ########## @@ -18,16 +18,30 @@ package org.apache.carbondata.spark.testsuite.blockprune import java.io.DataOutputStream +import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.spark.sql.Row import org.scalatest.BeforeAndAfterAll import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties import org.apache.spark.sql.test.util.QueryTest /** * This class contains test cases for block prune query */ class BlockPruneQueryTestCase extends QueryTest with BeforeAndAfterAll { val outputPath = s"$resourcesPath/block_prune_test.csv" + val MULTI_THREAD_ENABLE_FILES_COUNT = "1"; Review comment: resolved ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
marchpure commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r384321422 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ########## @@ -138,8 +138,12 @@ public CarbonTable getTable() { } } int numOfThreadsForPruning = CarbonProperties.getNumOfThreadsForPruning(); + int carbonDriverPruningMultiThreadEnableFilesCount = Review comment: resolved ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
marchpure commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r384321440 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ########## @@ -138,8 +138,12 @@ public CarbonTable getTable() { } } int numOfThreadsForPruning = CarbonProperties.getNumOfThreadsForPruning(); + int carbonDriverPruningMultiThreadEnableFilesCount = + Integer.parseInt(CarbonProperties.getInstance().getProperty( + CarbonCommonConstants.CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT, Review comment: resolved ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
marchpure commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-591288619 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-591295234 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/487/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |