Indhumathi27 commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r471242699 ########## File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java ########## @@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks blockChunkHolder) @Override public Object getDataBasedOnDataType(ByteBuffer dataBuffer) { - Object[] data = fillData(dataBuffer); + return getDataBasedOnDataType(dataBuffer, false); + } + + @Override + public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean getBytesData) { Review comment: Already added a new method getObjectDataBasedOnDataType. this boolen is still required, as in filldata() method, complex children getDataBasedOnDataType will be called ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r471254176 ########## File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java ########## @@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks blockChunkHolder) @Override public Object getDataBasedOnDataType(ByteBuffer dataBuffer) { - Object[] data = fillData(dataBuffer); + return getDataBasedOnDataType(dataBuffer, false); + } + + @Override + public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean getBytesData) { Review comment: why not call getObjectDataBasedOnDataType? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r471365424 ########## File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java ########## @@ -39,7 +39,7 @@ public ArrayQueryType(String name, String parentName, int columnIndex) { @Override public void addChildren(GenericQueryType children) { - if (this.getName().equals(children.getParentName())) { + if (null == this.getName() || this.getName().equals(children.getParentName())) { Review comment: removed this check ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -2456,4 +2456,15 @@ private CarbonCommonConstants() { * property which defines the insert stage flow */ public static final String IS_INSERT_STAGE = "is_insert_stage"; + + /** + * Until the threshold for complex filter is reached, row id will be set to the bitset in + * implicit filter during secondary index pruning + */ + public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold"; Review comment: handled ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r471365673 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java ########## @@ -221,4 +221,9 @@ public int numberOfNodes() { public List<TableBlockInfo> getBlockInfos() { Review comment: removed getBlockInfos method ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r471365947 ########## File path: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java ########## @@ -41,39 +44,62 @@ * map that contains the mapping of block id to the valid blocklets in that block which contain * the data as per the applied filter */ - private Map<String, Set<Integer>> blockIdToBlockletIdMapping; + private final Map<String, Set<String>> blockIdToBlockletIdMapping; + + /** + * checks if implicit filter exceeds complex filter threshold + */ + private boolean isThresholdReached; public ImplicitExpression(List<Expression> implicitFilterList) { + final Logger LOGGER = LogServiceFactory.getLogService(getClass().getName()); Review comment: moved ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r471366000 ########## File path: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java ########## @@ -41,39 +44,62 @@ * map that contains the mapping of block id to the valid blocklets in that block which contain * the data as per the applied filter */ - private Map<String, Set<Integer>> blockIdToBlockletIdMapping; + private final Map<String, Set<String>> blockIdToBlockletIdMapping; + + /** + * checks if implicit filter exceeds complex filter threshold + */ + private boolean isThresholdReached; public ImplicitExpression(List<Expression> implicitFilterList) { + final Logger LOGGER = LogServiceFactory.getLogService(getClass().getName()); // initialize map with half the size of filter list as one block id can contain // multiple blocklets blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 2); for (Expression value : implicitFilterList) { String blockletPath = ((LiteralExpression) value).getLiteralExpValue().toString(); addBlockEntry(blockletPath); } + int complexFilterThreshold = CarbonProperties.getInstance().getComplexFilterThresholdForSI(); + isThresholdReached = implicitFilterList.size() > complexFilterThreshold; + if (isThresholdReached) { + LOGGER.info("Implicit Filter Size: " + implicitFilterList.size() + ", Threshold is: " + + complexFilterThreshold); + } } - public ImplicitExpression(Map<String, Set<Integer>> blockIdToBlockletIdMapping) { + public ImplicitExpression(Map<String, Set<String>> blockIdToBlockletIdMapping) { this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping; } private void addBlockEntry(String blockletPath) { Review comment: handled ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-674787907 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2004/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-674854838 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2005/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-674861462 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3746/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r471459506 ########## File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java ########## @@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks blockChunkHolder) @Override public Object getDataBasedOnDataType(ByteBuffer dataBuffer) { - Object[] data = fillData(dataBuffer); + return getDataBasedOnDataType(dataBuffer, false); + } + + @Override + public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean getBytesData) { Review comment: handled ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r473599181 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/command/SICreationCommand.scala ########## @@ -443,10 +443,34 @@ private[sql] case class CarbonCreateSecondaryIndexCommand( databaseName: String, tableName: String, indexTableName: String, absoluteTableIdentifier: AbsoluteTableIdentifier): TableInfo = { var schemaOrdinal = -1 - var allColumns = indexModel.columnNames.map { indexCol => - val colSchema = carbonTable.getDimensionByName(indexCol).getColumnSchema + val complexDimensions = carbonTable.getAllDimensions.asScala + .filter(dim => dim.getDataType.isComplexType && + indexModel.columnNames.asJava.contains(dim.getColName)) + if (complexDimensions.size > 1) { + throw new ErrorMessage("SI creation with more than one complex type is not supported yet"); + } + var allColumns = List[ColumnSchema]() Review comment: Consider the scenario where one SI table contains (complex, primitive1, primitive2), we need to maintain the same order. But now it will become primitive1,primitive2,complex1. which is wrong. So, suggest to keep the user-specified order. Can refer below code var allColumns = List[ColumnSchema]() indexModel.columnNames.foreach { indexCol => val dimension = carbonTable.getDimensionByName(tableName, indexCol) val colSchema = dimension.getColumnSchema schemaOrdinal += 1 allColumns = allColumns :+ cloneColumnSchema(colSchema, schemaOrdinal) } complexDimensions.foreach { complexDim => if (complexDim.getNumberOfChild > 0) { if (complexDim.getListOfChildDimensions.asScala .exists(col => DataTypes.isArrayType(col.getDataType))) { throw new ErrorMessage("SI creation with nested array complex type is not supported yet"); } } } ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r473600323 ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -2400,6 +2400,23 @@ private CarbonCommonConstants() { */ public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false"; + /** + * Until the threshold for complex filter is reached, row id will be set to the bitset in + * implicit filter during secondary index pruning + */ + public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold"; + + /** + * Maximum value for complex filter threshold + */ + public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000"; + + /** + * Property to decide if position id till row level or not + */ + public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX = Review comment: In the community, we concluded that no need of row level position reference. So, why this is required ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r473600574 ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -2400,6 +2400,23 @@ private CarbonCommonConstants() { */ public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false"; + /** + * Until the threshold for complex filter is reached, row id will be set to the bitset in + * implicit filter during secondary index pruning + */ + public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold"; + + /** + * Maximum value for complex filter threshold + */ + public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000"; + + /** + * Property to decide if position id till row level or not + */ + public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX = Review comment: I think above 2 properties also not needed if it is not row level position reference ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r473601787 ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -2400,6 +2400,23 @@ private CarbonCommonConstants() { */ public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false"; + /** + * Until the threshold for complex filter is reached, row id will be set to the bitset in + * implicit filter during secondary index pruning + */ + public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold"; + + /** + * Maximum value for complex filter threshold + */ + public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000"; + + /** + * Property to decide if position id till row level or not + */ + public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX = Review comment: CC: @kunal642 , @QiangCai ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r473601787 ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -2400,6 +2400,23 @@ private CarbonCommonConstants() { */ public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false"; + /** + * Until the threshold for complex filter is reached, row id will be set to the bitset in + * implicit filter during secondary index pruning + */ + public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold"; + + /** + * Maximum value for complex filter threshold + */ + public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000"; + + /** + * Property to decide if position id till row level or not + */ + public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX = Review comment: CC: @kunal642 , @QiangCai ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r473620842 ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -2400,6 +2400,23 @@ private CarbonCommonConstants() { */ public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false"; + /** + * Until the threshold for complex filter is reached, row id will be set to the bitset in + * implicit filter during secondary index pruning + */ + public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold"; + + /** + * Maximum value for complex filter threshold + */ + public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000"; + + /** + * Property to decide if position id till row level or not + */ + public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX = Review comment: cc: @kunal642 , @QiangCai , @ravipesala ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r473873670 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/command/SICreationCommand.scala ########## @@ -443,10 +443,34 @@ private[sql] case class CarbonCreateSecondaryIndexCommand( databaseName: String, tableName: String, indexTableName: String, absoluteTableIdentifier: AbsoluteTableIdentifier): TableInfo = { var schemaOrdinal = -1 - var allColumns = indexModel.columnNames.map { indexCol => - val colSchema = carbonTable.getDimensionByName(indexCol).getColumnSchema + val complexDimensions = carbonTable.getAllDimensions.asScala + .filter(dim => dim.getDataType.isComplexType && + indexModel.columnNames.asJava.contains(dim.getColName)) + if (complexDimensions.size > 1) { + throw new ErrorMessage("SI creation with more than one complex type is not supported yet"); + } + var allColumns = List[ColumnSchema]() Review comment: handled ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677642207 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3810/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677647032 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2069/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677708011 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2077/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |