kunal642 opened a new pull request #3902: URL: https://github.com/apache/carbondata/pull/3902 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-680791844 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2135/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-680807986 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3876/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-681626492 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3883/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-681629618 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2142/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-682990756 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2166/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-682991866 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3907/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683137391 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3910/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683138098 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2169/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683396671 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2177/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683398875 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3918/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683420549 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2179/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683420751 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3920/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683477956 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2183/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683478447 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3924/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kunal642 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683729131 @akashrn5 @kumarvishal09 @QiangCai @ravipesala @ajantha-bhat Please review ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#discussion_r481694795 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala ########## @@ -373,6 +375,146 @@ object CarbonFilters { val carbonTable = CarbonEnv.getCarbonTable(identifier)(sparkSession) getPartitions(partitionFilters, sparkSession, carbonTable) } + + def getStorageOrdinal(filter: Filter, carbonTable: CarbonTable): Int = { + val column = filter.references.map(carbonTable.getColumnByName) + if (column.isEmpty) { + -1 + } else { + if (column.head.isDimension) { + column.head.getOrdinal + } else { + column.head.getOrdinal + carbonTable.getAllDimensions.size() + } + } + } + + def collectSimilarExpressions(filter: Filter, table: CarbonTable): Seq[(Filter, Int)] = { + filter match { + case sources.And(left, right) => + collectSimilarExpressions(left, table) ++ collectSimilarExpressions(right, table) + case sources.Or(left, right) => collectSimilarExpressions(left, table) ++ + collectSimilarExpressions(right, table) + case others => Seq((others, getStorageOrdinal(others, table))) + } + } + + /** + * This method will reorder the filter based on the Storage Ordinal of the column references. + * + * Example1: + * And And + * Or And => Or And + * col3 col1 col2 col1 col1 col3 col1 col2 + * + * **Mixed expression filter reordered locally, but wont be reordered globally.** + * + * Example2: + * And And + * And And => And And + * col3 col1 col2 col1 col1 col1 col2 col3 Review comment: right side after optimization when it becomes (col1 And col1), better to modify filter to just col1 ? ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala ########## @@ -373,6 +375,146 @@ object CarbonFilters { val carbonTable = CarbonEnv.getCarbonTable(identifier)(sparkSession) getPartitions(partitionFilters, sparkSession, carbonTable) } + + def getStorageOrdinal(filter: Filter, carbonTable: CarbonTable): Int = { + val column = filter.references.map(carbonTable.getColumnByName) + if (column.isEmpty) { + -1 + } else { + if (column.head.isDimension) { + column.head.getOrdinal + } else { + column.head.getOrdinal + carbonTable.getAllDimensions.size() + } + } + } + + def collectSimilarExpressions(filter: Filter, table: CarbonTable): Seq[(Filter, Int)] = { + filter match { + case sources.And(left, right) => + collectSimilarExpressions(left, table) ++ collectSimilarExpressions(right, table) + case sources.Or(left, right) => collectSimilarExpressions(left, table) ++ + collectSimilarExpressions(right, table) + case others => Seq((others, getStorageOrdinal(others, table))) + } + } + + /** + * This method will reorder the filter based on the Storage Ordinal of the column references. + * + * Example1: + * And And + * Or And => Or And + * col3 col1 col2 col1 col1 col3 col1 col2 + * + * **Mixed expression filter reordered locally, but wont be reordered globally.** + * + * Example2: + * And And + * And And => And And + * col3 col1 col2 col1 col1 col1 col2 col3 + * + * Or Or + * Or Or => Or Or + * col3 col1 col2 col1 col1 col1 col2 col3 + * + * **Similar expression filters are reordered globally** + * + * @param filter the filter expression to be reordered + * @return The reordered filter with the current ordinal + */ + def reorderFilter(filter: Filter, table: CarbonTable): (Filter, Int) = { + val filterMap = mutable.HashMap[String, List[(Filter, Int)]]() + def sortFilter(filter: Filter): (Filter, Int) = { Review comment: Agree with david. I too felt it is not easy to read ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala ########## @@ -72,7 +72,9 @@ case class CarbonDatasourceHadoopRelation( projects: Seq[NamedExpression], filters: Array[Filter], partitions: Seq[PartitionSpec]): RDD[InternalRow] = { - val filterExpression: Option[Expression] = filters.flatMap { filter => + val reorderedFilter = filters.map(CarbonFilters.reorderFilter(_, carbonTable)).sortBy(_._2) Review comment: better to do this at down layer (common code) so that presto and hive queries also can make use of this ? ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala ########## @@ -373,6 +375,146 @@ object CarbonFilters { val carbonTable = CarbonEnv.getCarbonTable(identifier)(sparkSession) getPartitions(partitionFilters, sparkSession, carbonTable) } + + def getStorageOrdinal(filter: Filter, carbonTable: CarbonTable): Int = { + val column = filter.references.map(carbonTable.getColumnByName) + if (column.isEmpty) { + -1 + } else { + if (column.head.isDimension) { + column.head.getOrdinal + } else { + column.head.getOrdinal + carbonTable.getAllDimensions.size() + } + } + } + + def collectSimilarExpressions(filter: Filter, table: CarbonTable): Seq[(Filter, Int)] = { + filter match { + case sources.And(left, right) => + collectSimilarExpressions(left, table) ++ collectSimilarExpressions(right, table) + case sources.Or(left, right) => collectSimilarExpressions(left, table) ++ + collectSimilarExpressions(right, table) + case others => Seq((others, getStorageOrdinal(others, table))) + } + } + + /** + * This method will reorder the filter based on the Storage Ordinal of the column references. Review comment: In case of multiple AND filters, we need to first read the column that gives less output, so that this output can be sent to other filter by bitset pipeline. So, always changing to storage order may not be good idea. [because somecase the user can know the cardinality and provide efficient filter order] @kumarvishal09, @QiangCai, @ravipesala : what is your opinion on this ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-685310097 @kunal : Also please add some report in the description on how much was the performance with and without this change. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat edited a comment on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-685310097 @kunal642 : Also please add some report in the description on how much was the performance with and without this change. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#discussion_r481737995 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala ########## @@ -373,6 +375,146 @@ object CarbonFilters { val carbonTable = CarbonEnv.getCarbonTable(identifier)(sparkSession) getPartitions(partitionFilters, sparkSession, carbonTable) } + + def getStorageOrdinal(filter: Filter, carbonTable: CarbonTable): Int = { + val column = filter.references.map(carbonTable.getColumnByName) + if (column.isEmpty) { + -1 + } else { + if (column.head.isDimension) { + column.head.getOrdinal + } else { + column.head.getOrdinal + carbonTable.getAllDimensions.size() + } + } + } + + def collectSimilarExpressions(filter: Filter, table: CarbonTable): Seq[(Filter, Int)] = { + filter match { + case sources.And(left, right) => + collectSimilarExpressions(left, table) ++ collectSimilarExpressions(right, table) + case sources.Or(left, right) => collectSimilarExpressions(left, table) ++ + collectSimilarExpressions(right, table) + case others => Seq((others, getStorageOrdinal(others, table))) + } + } + + /** + * This method will reorder the filter based on the Storage Ordinal of the column references. Review comment: I feel if user specify that don't change filter order (because he knows the cardinality) by a carbon property. we should not change the order. Else we can change ! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |