Indhumathi27 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r387524406 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/blockprune/BlockPruneQueryTestCase.scala ########## @@ -59,11 +72,28 @@ class BlockPruneQueryTestCase extends QueryTest with BeforeAndAfterAll { } } } + } + test("test block prune without filter") { sql("DROP TABLE IF EXISTS blockprune") + sql( + """ + CREATE TABLE IF NOT EXISTS blockprune (name string, id int) + STORED AS carbondata + """) + sql( + s"LOAD DATA LOCAL INPATH '$outputPath' INTO table blockprune options('FILEHEADER'='name,id')" + ) + checkAnswer( + sql( + """ + select * from blockprune limit 1 Review comment: Can move the sql to single line ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r387524827 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/blockprune/BlockPruneQueryTestCase.scala ########## @@ -91,6 +121,46 @@ class BlockPruneQueryTestCase extends QueryTest with BeforeAndAfterAll { Seq(Row("b", 240001))) } + test("test block prune multi threads") { Review comment: Add a testcase for negative configured property(CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r387524957 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/blockprune/BlockPruneQueryTestCase.scala ########## @@ -91,6 +121,46 @@ class BlockPruneQueryTestCase extends QueryTest with BeforeAndAfterAll { Seq(Row("b", 240001))) } + test("test block prune multi threads") { + sql("DROP TABLE IF EXISTS blockprune") + + perpareCarbonProperty(CarbonCommonConstants.CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT, + enableMultiThreadFilesCount) + + sql( Review comment: Can move the sql to single line ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r387530433 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ########## @@ -302,19 +323,29 @@ public Void call() throws IOException { segmentPropertiesFetcher.getSegmentPropertiesFromDataMap(dataMapList.get(0)); Segment segment = segmentDataMapGroup.getSegment(); if (filter.isResolvedOnSegment(segmentProperties)) { + FilterExecuter filterExecuter = FilterUtil + .getFilterExecuterTree(filter.getResolver(), segmentProperties, + null, table.getMinMaxCacheColumns(segmentProperties), + false); for (int i = segmentDataMapGroup.getFromIndex(); i <= segmentDataMapGroup.getToIndex(); i++) { List<Blocklet> dmPruneBlocklets = dataMapList.get(i).prune( - filter.getResolver(), segmentProperties, partitions); + filter.getResolver(), segmentProperties, partitions, filterExecuter); pruneBlocklets.addAll(addSegmentId( blockletDetailsFetcher.getExtendedBlocklets(dmPruneBlocklets, segment), segment)); } } else { + FilterExecuter filterExecuter = FilterUtil + .getFilterExecuterTree(new DataMapFilter(segmentProperties, table, + filter.getNewCopyOfExpression()).getResolver(), segmentProperties, + null, table.getMinMaxCacheColumns(segmentProperties), + false); for (int i = segmentDataMapGroup.getFromIndex(); i <= segmentDataMapGroup.getToIndex(); i++) { List<Blocklet> dmPruneBlocklets = dataMapList.get(i).prune( - filter.getNewCopyOfExpression(), segmentProperties, partitions, table); + filter.getNewCopyOfExpression(), segmentProperties, partitions, table, Review comment: Avoid deserializing ` filter.getNewCopyOfExpression()` again, as already done in line:341, can reuse the same ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r387531359 ########## File path: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentProperties.java ########## @@ -35,13 +38,24 @@ import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema; import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.log4j.Logger; + /** * This class contains all the details about the restructuring information of * the block. This will be used during query execution to handle restructure * information */ public class SegmentProperties { + private static final Logger LOG = + LogServiceFactory.getLogService(SegmentProperties.class.getName()); + + private static final int dimensionsFingerPrinterShift = 1; Review comment: Please add variable description for dimension,measure,complex fingerPrinter ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r387534591 ########## File path: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentProperties.java ########## @@ -147,6 +173,27 @@ private void fillBlockToDimensionOrdinalMapping() { } } + /** + * compare the segmentproperties based on fingerprinter + */ + @Override + public boolean equals(Object obj) { + if (!(obj instanceof SegmentProperties)) { + return false; + } + if (this.getNumberOfColumns() != ((SegmentProperties) obj).getNumberOfColumns()) { + return false; + } + return getFingerprinter() != Long.MIN_VALUE && + ((SegmentProperties) obj).getFingerprinter() != Long.MIN_VALUE && Review comment: Assign (SegmentProperties) obj) to a variable and reuse to avoid cast overhead ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r387546800 ########## File path: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentProperties.java ########## @@ -35,13 +38,24 @@ import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema; import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.log4j.Logger; + /** * This class contains all the details about the restructuring information of * the block. This will be used during query execution to handle restructure * information */ public class SegmentProperties { + private static final Logger LOG = + LogServiceFactory.getLogService(SegmentProperties.class.getName()); + + private static final int dimensionsFingerPrinterShift = 1; + + private static final int measuresFingerPrinterShift = 2; Review comment: ```suggestion private static final int MEASURES_FINGER_PRINTER_SHIFT = 2; ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r387547012 ########## File path: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentProperties.java ########## @@ -35,13 +38,24 @@ import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema; import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.log4j.Logger; + /** * This class contains all the details about the restructuring information of * the block. This will be used during query execution to handle restructure * information */ public class SegmentProperties { + private static final Logger LOG = + LogServiceFactory.getLogService(SegmentProperties.class.getName()); + + private static final int dimensionsFingerPrinterShift = 1; + + private static final int measuresFingerPrinterShift = 2; + + private static final int complexFingerPrinterShift = 3; Review comment: ```suggestion private static final int COMPLEX_FINGER_PRINTER_SHIFT = 3; ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r387546514 ########## File path: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentProperties.java ########## @@ -35,13 +38,24 @@ import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema; import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.log4j.Logger; + /** * This class contains all the details about the restructuring information of * the block. This will be used during query execution to handle restructure * information */ public class SegmentProperties { + private static final Logger LOG = + LogServiceFactory.getLogService(SegmentProperties.class.getName()); + + private static final int dimensionsFingerPrinterShift = 1; Review comment: ```suggestion private static final int DIMENSIONS_FINGER_PRINTER_SHIFT = 1; ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r387556468 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java ########## @@ -686,6 +685,11 @@ public long getRowCount(Segment segment, List<PartitionSpec> partitions) { boolean[] minMaxFlag = getMinMaxFlag(row, BLOCK_MIN_MAX_FLAG); String fileName = getFileNameWithFilePath(row, filePath); short blockletId = getBlockletId(row); + if (!validateSegmentProperties(segmentProperties)) { Review comment: Validate segment properties and build filter executer each times in loop? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r387568150 ########## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ########## @@ -1843,6 +1846,33 @@ public static int getNumOfThreadsForPruning() { return numOfThreadsForPruning; } + /** + * This method validates the driverPruningMultiThreadEnableFilesCount + */ + public static int getDriverPruningMultiThreadEnableFilesCount() { + int driverPruningMultiThreadEnableFilesCount = Integer.parseInt(CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT, + CarbonCommonConstants.CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT_DEFAULT)); + try { + if (driverPruningMultiThreadEnableFilesCount <= 0) { + LOGGER.info("The driver prunning multithread enable files count value \"" + + driverPruningMultiThreadEnableFilesCount + + "\" is invalid. Using the default value \"" + + CarbonCommonConstants.CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT_DEFAULT); + driverPruningMultiThreadEnableFilesCount = Integer.parseInt(CarbonCommonConstants + .CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT_DEFAULT); Review comment: Can define CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT_DEFAULT as int field directly. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-594450405 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/604/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-594453116 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2311/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-594537665 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2320/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-594539725 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/613/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-595014875 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-595015808 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-595017433 @marchpure please rebase and marge commits ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
marchpure commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-595019138 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance
URL: https://github.com/apache/carbondata/pull/3620#discussion_r388074374 ########## File path: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentProperties.java ########## @@ -147,6 +186,28 @@ private void fillBlockToDimensionOrdinalMapping() { } } + /** + * compare the segmentproperties based on fingerprinter + */ + @Override + public boolean equals(Object obj) { + if (!(obj instanceof SegmentProperties)) { + return false; + } + SegmentProperties another = (SegmentProperties) obj; Review comment: Please change the variable name to `segmentProperties` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |