akashrn5 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r394315724 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ########## @@ -186,6 +217,106 @@ private void getTableBlockUniqueIdentifierWrappers(List<PartitionSpec> partition } } + /** + * Using blockLevel minmax values, identify if segment has to be added for further pruning and to + * load segment index info to cache + * @param segment to be identified if needed for loading block datamaps + * @param segmentMetaDataInfo list of block level min max values + * @param filter filter expression + * @param identifiers tableBlockIndexUniqueIdentifiers + * @param tableBlockIndexUniqueIdentifierWrappers to add tableBlockIndexUniqueIdentifiers + */ + private void getTableBlockIndexUniqueIdentifierUsingSegmentMinMax(Segment segment, + SegmentMetaDataInfo segmentMetaDataInfo, DataMapFilter filter, + Set<TableBlockIndexUniqueIdentifier> identifiers, + List<TableBlockIndexUniqueIdentifierWrapper> tableBlockIndexUniqueIdentifierWrappers) { + boolean isScanRequired = false; + Map<String, SegmentColumnMetaDataInfo> segmentColumnMetaDataInfoMap = + segmentMetaDataInfo.getSegmentColumnMetaDataInfoMap(); + int length = segmentColumnMetaDataInfoMap.size(); + // Add columnSchemas based on the columns present in segment + List<ColumnSchema> columnSchemas = new ArrayList<>(); + byte[][] min = new byte[length][]; + byte[][] max = new byte[length][]; + boolean[] minMaxFlag = new boolean[length]; + int i = 0; + + // get current columnSchema list for the table + Map<String, ColumnSchema> tableColumnSchemas = + this.getCarbonTable().getTableInfo().getFactTable().getListOfColumns().stream() + .collect(Collectors.toMap(ColumnSchema::getColumnUniqueId, ColumnSchema::clone)); + + // fill min,max and columnSchema values + for (Map.Entry<String, SegmentColumnMetaDataInfo> columnMetaData : + segmentColumnMetaDataInfoMap.entrySet()) { + ColumnSchema columnSchema = tableColumnSchemas.get(columnMetaData.getKey()); + if (null != columnSchema) { + // get segment sort column and column drift info + boolean isSortColumnInBlock = columnMetaData.getValue().isSortColumn(); Review comment: please rename ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r394329499 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessorStepOnSpark.scala ########## @@ -26,12 +26,13 @@ import org.apache.spark.sql.Row import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.GenericInternalRow import org.apache.spark.TaskContext -import org.apache.spark.util.LongAccumulator +import org.apache.spark.util.{CollectionAccumulator, LongAccumulator} Review comment: revert ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r394334509 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ########## @@ -248,21 +258,31 @@ class CarbonTableCompactor(carbonLoadModel: CarbonLoadModel, } else { // Get the segment files each updated segment in case of IUD compaction if (compactionType == CompactionType.IUD_UPDDEL_DELTA) { - val segmentFilesList = loadsToMerge.asScala.map{seg => + val segmentFilesList = loadsToMerge.asScala.map { seg => + val segmentMetaDataInfo = new SegmentFileStore(carbonLoadModel.getTablePath, + seg.getSegmentFile).getSegmentFile.getSegmentMetaDataInfo val file = SegmentFileStore.writeSegmentFile( carbonTable, seg.getLoadName, - carbonLoadModel.getFactTimeStamp.toString) + carbonLoadModel.getFactTimeStamp.toString, + segmentMetaDataInfo) new Segment(seg.getLoadName, file) }.filter(_.getSegmentFileName != null).asJava segmentFilesForIUDCompact = new util.ArrayList[Segment](segmentFilesList) } else { + // get segmentMetadata info from accumulator + val segmentMetaDataInfo = CarbonDataRDDFactory.getSegmentMetaDataInfoFromAccumulator( Review comment: `getSegmentMetaDataInfoFromAccumulator` move this method to carbonLoaderUtil as its used in all load flows ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r394837395 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ########## @@ -203,6 +203,7 @@ object SecondaryIndexUtil { seg.getLoadName, segmentIdToLoadStartTimeMapping(seg.getLoadName).toString, carbonLoadModel.getFactTimeStamp.toString, + null, Review comment: created ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r394837435 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/InsertTaskCompletionListener.scala ########## @@ -20,21 +20,32 @@ package org.apache.carbondata.spark.rdd import org.apache.spark.TaskContext import org.apache.spark.sql.carbondata.execution.datasources.tasklisteners.CarbonLoadTaskCompletionListener import org.apache.spark.sql.execution.command.ExecutionErrors +import org.apache.spark.util.CollectionAccumulator +import org.apache.carbondata.core.segmentmeta.SegmentMetaDataInfo import org.apache.carbondata.core.util.{DataTypeUtil, ThreadLocalTaskInfo} import org.apache.carbondata.processing.loading.{DataLoadExecutor, FailureCauses} import org.apache.carbondata.spark.util.CommonUtil class InsertTaskCompletionListener(dataLoadExecutor: DataLoadExecutor, - executorErrors: ExecutionErrors) + executorErrors: ExecutionErrors, + segmentMetaDataAccumulator: CollectionAccumulator[Map[String, SegmentMetaDataInfo]], + tableName: String, + segmentId: String) extends CarbonLoadTaskCompletionListener { override def onTaskCompletion(context: TaskContext): Unit = { try { - dataLoadExecutor.close() + // fill segment level minMax to accumulator + CarbonDataRDDFactory.fillSegmentMetaDataInfoToAccumulator(tableName, Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r394837462 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ########## @@ -248,21 +258,31 @@ class CarbonTableCompactor(carbonLoadModel: CarbonLoadModel, } else { // Get the segment files each updated segment in case of IUD compaction if (compactionType == CompactionType.IUD_UPDDEL_DELTA) { - val segmentFilesList = loadsToMerge.asScala.map{seg => + val segmentFilesList = loadsToMerge.asScala.map { seg => + val segmentMetaDataInfo = new SegmentFileStore(carbonLoadModel.getTablePath, + seg.getSegmentFile).getSegmentFile.getSegmentMetaDataInfo val file = SegmentFileStore.writeSegmentFile( carbonTable, seg.getLoadName, - carbonLoadModel.getFactTimeStamp.toString) + carbonLoadModel.getFactTimeStamp.toString, + segmentMetaDataInfo) new Segment(seg.getLoadName, file) }.filter(_.getSegmentFileName != null).asJava segmentFilesForIUDCompact = new util.ArrayList[Segment](segmentFilesList) } else { + // get segmentMetadata info from accumulator + val segmentMetaDataInfo = CarbonDataRDDFactory.getSegmentMetaDataInfoFromAccumulator( Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r394837484 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessorStepOnSpark.scala ########## @@ -26,12 +26,13 @@ import org.apache.spark.sql.Row import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.GenericInternalRow import org.apache.spark.TaskContext -import org.apache.spark.util.LongAccumulator +import org.apache.spark.util.{CollectionAccumulator, LongAccumulator} Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r394837511 ########## File path: core/src/main/java/org/apache/carbondata/core/segmentmeta/SegmentMetaDataInfoStats.java ########## @@ -0,0 +1,170 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.segmentmeta; + +import java.util.HashMap; +import java.util.LinkedHashMap; +import java.util.Map; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.util.ByteUtil; + +/** + * Holds segment level meta data information such as min,max, sortColumn info for the + * corresponding table + */ +public class SegmentMetaDataInfoStats { + + private SegmentMetaDataInfoStats() { + tableSegmentMetaDataInfoMap = new LinkedHashMap<>(); + } + + public static synchronized SegmentMetaDataInfoStats getInstance() { + if (null == segmentMetaDataInfoStats) { + segmentMetaDataInfoStats = new SegmentMetaDataInfoStats(); + return segmentMetaDataInfoStats; + } else { + return segmentMetaDataInfoStats; + } + } + + private Map<String, Map<String, BlockColumnMetaDataInfo>> tableSegmentMetaDataInfoMap; + + private static SegmentMetaDataInfoStats segmentMetaDataInfoStats; + + /** + * Prepare of map with key as column-id and value as SegmentColumnMetaDataInfo using the + * tableSegmentMetaDataInfoMap + * + * @param tableName get corresponding tableName from map + * @param segmentId get corresponding segment Id from map + * @return segmentMetaDataInfo for the corresponding segment + */ + public synchronized SegmentMetaDataInfo getTableSegmentMetaDataInfo(String tableName, + String segmentId) { + Map<String, SegmentColumnMetaDataInfo> segmentColumnMetaDataInfoMap = new LinkedHashMap<>(); + Map<String, BlockColumnMetaDataInfo> segmentMetaDataInfoMap = + this.tableSegmentMetaDataInfoMap.get(tableName); + if (null != segmentMetaDataInfoMap && !segmentMetaDataInfoMap.isEmpty() + && null != segmentMetaDataInfoMap.get(segmentId)) { + BlockColumnMetaDataInfo blockColumnMetaDataInfo = segmentMetaDataInfoMap.get(segmentId); + System.out.println("Column Schemas Size: " + blockColumnMetaDataInfo.getColumnSchemas().size() + + " Min size: " + blockColumnMetaDataInfo.getMin().length); + for (int i = 0; i < blockColumnMetaDataInfo.getColumnSchemas().size(); i++) { + org.apache.carbondata.format.ColumnSchema columnSchema = + blockColumnMetaDataInfo.getColumnSchemas().get(i); + boolean isSortColumn = false; + boolean isColumnDrift = false; + if (null != columnSchema.columnProperties && !columnSchema.columnProperties.isEmpty()) { + if (null != columnSchema.columnProperties.get(CarbonCommonConstants.SORT_COLUMNS)) { + isSortColumn = true; + } + if (null != columnSchema.columnProperties.get(CarbonCommonConstants.COLUMN_DRIFT)) { + isColumnDrift = true; + } + } + segmentColumnMetaDataInfoMap.put(columnSchema.column_id, + new SegmentColumnMetaDataInfo(isSortColumn, blockColumnMetaDataInfo.getMin()[i], + blockColumnMetaDataInfo.getMax()[i], isColumnDrift)); + } + } + return new SegmentMetaDataInfo(segmentColumnMetaDataInfoMap); + } + + public synchronized void setBlockMetaDataInfo(String tableName, String segmentId, + BlockColumnMetaDataInfo currentBlockColumnMetaInfo) { + // check if tableName is present in tableSegmentMetaDataInfoMap + if (!this.tableSegmentMetaDataInfoMap.isEmpty() && null != this.tableSegmentMetaDataInfoMap + .get(tableName) && !this.tableSegmentMetaDataInfoMap.get(tableName).isEmpty() Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r394837533 ########## File path: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/column/ColumnSchema.java ########## @@ -600,4 +604,20 @@ public boolean isIndexColumn() { public void setIndexColumn(boolean indexColumn) { this.indexColumn = indexColumn; } + + public ColumnSchema clone() { + try { Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r394837562 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ########## @@ -186,6 +217,106 @@ private void getTableBlockUniqueIdentifierWrappers(List<PartitionSpec> partition } } + /** + * Using blockLevel minmax values, identify if segment has to be added for further pruning and to + * load segment index info to cache + * @param segment to be identified if needed for loading block datamaps + * @param segmentMetaDataInfo list of block level min max values + * @param filter filter expression + * @param identifiers tableBlockIndexUniqueIdentifiers + * @param tableBlockIndexUniqueIdentifierWrappers to add tableBlockIndexUniqueIdentifiers + */ + private void getTableBlockIndexUniqueIdentifierUsingSegmentMinMax(Segment segment, + SegmentMetaDataInfo segmentMetaDataInfo, DataMapFilter filter, + Set<TableBlockIndexUniqueIdentifier> identifiers, + List<TableBlockIndexUniqueIdentifierWrapper> tableBlockIndexUniqueIdentifierWrappers) { + boolean isScanRequired = false; + Map<String, SegmentColumnMetaDataInfo> segmentColumnMetaDataInfoMap = + segmentMetaDataInfo.getSegmentColumnMetaDataInfoMap(); + int length = segmentColumnMetaDataInfoMap.size(); + // Add columnSchemas based on the columns present in segment + List<ColumnSchema> columnSchemas = new ArrayList<>(); + byte[][] min = new byte[length][]; + byte[][] max = new byte[length][]; + boolean[] minMaxFlag = new boolean[length]; + int i = 0; + + // get current columnSchema list for the table + Map<String, ColumnSchema> tableColumnSchemas = + this.getCarbonTable().getTableInfo().getFactTable().getListOfColumns().stream() + .collect(Collectors.toMap(ColumnSchema::getColumnUniqueId, ColumnSchema::clone)); + + // fill min,max and columnSchema values + for (Map.Entry<String, SegmentColumnMetaDataInfo> columnMetaData : + segmentColumnMetaDataInfoMap.entrySet()) { + ColumnSchema columnSchema = tableColumnSchemas.get(columnMetaData.getKey()); + if (null != columnSchema) { + // get segment sort column and column drift info + boolean isSortColumnInBlock = columnMetaData.getValue().isSortColumn(); Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r394837594 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ########## @@ -186,6 +217,106 @@ private void getTableBlockUniqueIdentifierWrappers(List<PartitionSpec> partition } } + /** + * Using blockLevel minmax values, identify if segment has to be added for further pruning and to + * load segment index info to cache + * @param segment to be identified if needed for loading block datamaps + * @param segmentMetaDataInfo list of block level min max values + * @param filter filter expression + * @param identifiers tableBlockIndexUniqueIdentifiers + * @param tableBlockIndexUniqueIdentifierWrappers to add tableBlockIndexUniqueIdentifiers + */ + private void getTableBlockIndexUniqueIdentifierUsingSegmentMinMax(Segment segment, + SegmentMetaDataInfo segmentMetaDataInfo, DataMapFilter filter, + Set<TableBlockIndexUniqueIdentifier> identifiers, + List<TableBlockIndexUniqueIdentifierWrapper> tableBlockIndexUniqueIdentifierWrappers) { + boolean isScanRequired = false; + Map<String, SegmentColumnMetaDataInfo> segmentColumnMetaDataInfoMap = + segmentMetaDataInfo.getSegmentColumnMetaDataInfoMap(); + int length = segmentColumnMetaDataInfoMap.size(); + // Add columnSchemas based on the columns present in segment + List<ColumnSchema> columnSchemas = new ArrayList<>(); + byte[][] min = new byte[length][]; + byte[][] max = new byte[length][]; + boolean[] minMaxFlag = new boolean[length]; + int i = 0; + + // get current columnSchema list for the table + Map<String, ColumnSchema> tableColumnSchemas = + this.getCarbonTable().getTableInfo().getFactTable().getListOfColumns().stream() + .collect(Collectors.toMap(ColumnSchema::getColumnUniqueId, ColumnSchema::clone)); + + // fill min,max and columnSchema values + for (Map.Entry<String, SegmentColumnMetaDataInfo> columnMetaData : + segmentColumnMetaDataInfoMap.entrySet()) { + ColumnSchema columnSchema = tableColumnSchemas.get(columnMetaData.getKey()); + if (null != columnSchema) { + // get segment sort column and column drift info + boolean isSortColumnInBlock = columnMetaData.getValue().isSortColumn(); + boolean isColumnDriftInBlock = columnMetaData.getValue().isColumnDrift(); + if (null != columnSchema.getColumnProperties()) { + // get current sort column and column drift info + String isSortColumn = + columnSchema.getColumnProperties().get(CarbonCommonConstants.SORT_COLUMNS); + String isColumnDrift = + columnSchema.getColumnProperties().get(CarbonCommonConstants.COLUMN_DRIFT); + if (null != isSortColumn) { + if (isSortColumn.equalsIgnoreCase("true") && !isSortColumnInBlock) { + modifyColumnSchemaForSortColumn(columnSchema, isColumnDriftInBlock, isColumnDrift); Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r394837640 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ########## @@ -135,7 +136,7 @@ public CarbonTable getTable() { int datamapsCount = 0; // In case if filter has matched partitions, then update the segments with datamap's Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-601085824 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/804/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-601089118 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2509/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-601307328 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-602036907 > @Indhumathi27 : make that carbon property deafult value to false and In query flow, If the table is transactional table and segment minmax is not set. Throw runtime exception. So, that CI can catch all the missed scenarios. is it verified ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r396045433 ########## File path: docs/configuration-parameters.md ########## @@ -146,6 +146,7 @@ This section provides the details of all the configurations required for the Car | carbon.query.prefetch.enable | true | By default this property is true, so prefetch is used in query to read next blocklet asynchronously in other thread while processing current blocklet in main thread. This can help to reduce CPU idle time. Setting this property false will disable this prefetch feature in query. | | carbon.query.stage.input.enable | false | Stage input files are data files written by external applications (such as Flink), but have not been loaded into carbon table. Enabling this configuration makes query to include these files, thus makes query on latest data. However, since these files are not indexed, query maybe slower as full scan is required for these files. | | carbon.driver.pruning.multi.thread.enable.files.count | 100000 | To prune in multi-thread when total number of segment files for a query increases beyond the configured value. | +| carbon.load.all.indexes.to.cache | true | Setting this configuration to false, will prune and load only matched segment indexes to cache using segment metadata information such as columnid and it's minmax values, which decreases the usage of driver memory. | Review comment: This is renmaed now right ? please update ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r396045433 ########## File path: docs/configuration-parameters.md ########## @@ -146,6 +146,7 @@ This section provides the details of all the configurations required for the Car | carbon.query.prefetch.enable | true | By default this property is true, so prefetch is used in query to read next blocklet asynchronously in other thread while processing current blocklet in main thread. This can help to reduce CPU idle time. Setting this property false will disable this prefetch feature in query. | | carbon.query.stage.input.enable | false | Stage input files are data files written by external applications (such as Flink), but have not been loaded into carbon table. Enabling this configuration makes query to include these files, thus makes query on latest data. However, since these files are not indexed, query maybe slower as full scan is required for these files. | | carbon.driver.pruning.multi.thread.enable.files.count | 100000 | To prune in multi-thread when total number of segment files for a query increases beyond the configured value. | +| carbon.load.all.indexes.to.cache | true | Setting this configuration to false, will prune and load only matched segment indexes to cache using segment metadata information such as columnid and it's minmax values, which decreases the usage of driver memory. | Review comment: This is renamed now right ? please update ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r396066291 ########## File path: docs/configuration-parameters.md ########## @@ -146,6 +146,7 @@ This section provides the details of all the configurations required for the Car | carbon.query.prefetch.enable | true | By default this property is true, so prefetch is used in query to read next blocklet asynchronously in other thread while processing current blocklet in main thread. This can help to reduce CPU idle time. Setting this property false will disable this prefetch feature in query. | | carbon.query.stage.input.enable | false | Stage input files are data files written by external applications (such as Flink), but have not been loaded into carbon table. Enabling this configuration makes query to include these files, thus makes query on latest data. However, since these files are not indexed, query maybe slower as full scan is required for these files. | | carbon.driver.pruning.multi.thread.enable.files.count | 100000 | To prune in multi-thread when total number of segment files for a query increases beyond the configured value. | +| carbon.load.all.indexes.to.cache | true | Setting this configuration to false, will prune and load only matched segment indexes to cache using segment metadata information such as columnid and it's minmax values, which decreases the usage of driver memory. | Review comment: updated ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-602176659 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2536/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |