[GitHub] [carbondata] kunal642 opened a new pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

classic Classic list List threaded Threaded
55 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 opened a new pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox

kunal642 opened a new pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902


    ### Why is this PR needed?
   
   
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox

CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-680791844


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2135/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-680807986


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3876/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-681626492


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3883/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-681629618


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2142/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-682990756


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2166/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-682991866


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3907/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683137391


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3910/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683138098


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2169/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683396671


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2177/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683398875


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3918/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683420549


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2179/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683420751


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3920/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683477956


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2183/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683478447


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3924/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

kunal642 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683729131


   @akashrn5 @kumarvishal09 @QiangCai @ravipesala @ajantha-bhat Please review


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#discussion_r481694795



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
##########
@@ -373,6 +375,146 @@ object CarbonFilters {
     val carbonTable = CarbonEnv.getCarbonTable(identifier)(sparkSession)
     getPartitions(partitionFilters, sparkSession, carbonTable)
   }
+
+  def getStorageOrdinal(filter: Filter, carbonTable: CarbonTable): Int = {
+    val column = filter.references.map(carbonTable.getColumnByName)
+    if (column.isEmpty) {
+      -1
+    } else {
+      if (column.head.isDimension) {
+        column.head.getOrdinal
+      } else {
+        column.head.getOrdinal + carbonTable.getAllDimensions.size()
+      }
+    }
+  }
+
+  def collectSimilarExpressions(filter: Filter, table: CarbonTable): Seq[(Filter, Int)] = {
+    filter match {
+      case sources.And(left, right) =>
+        collectSimilarExpressions(left, table) ++ collectSimilarExpressions(right, table)
+      case sources.Or(left, right) => collectSimilarExpressions(left, table) ++
+                              collectSimilarExpressions(right, table)
+      case others => Seq((others, getStorageOrdinal(others, table)))
+    }
+  }
+
+  /**
+   * This method will reorder the filter based on the Storage Ordinal of the column references.
+   *
+   * Example1:
+   *             And                                   And
+   *      Or          And             =>        Or            And
+   *  col3  col1  col2  col1                col1  col3    col1   col2
+   *
+   *  **Mixed expression filter reordered locally, but wont be reordered globally.**
+   *
+   * Example2:
+   *             And                                   And
+   *      And          And           =>       And            And
+   *  col3  col1  col2  col1                col1  col1    col2   col3

Review comment:
       right side after optimization when it becomes (col1 And col1), better to modify filter to just col1 ?

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
##########
@@ -373,6 +375,146 @@ object CarbonFilters {
     val carbonTable = CarbonEnv.getCarbonTable(identifier)(sparkSession)
     getPartitions(partitionFilters, sparkSession, carbonTable)
   }
+
+  def getStorageOrdinal(filter: Filter, carbonTable: CarbonTable): Int = {
+    val column = filter.references.map(carbonTable.getColumnByName)
+    if (column.isEmpty) {
+      -1
+    } else {
+      if (column.head.isDimension) {
+        column.head.getOrdinal
+      } else {
+        column.head.getOrdinal + carbonTable.getAllDimensions.size()
+      }
+    }
+  }
+
+  def collectSimilarExpressions(filter: Filter, table: CarbonTable): Seq[(Filter, Int)] = {
+    filter match {
+      case sources.And(left, right) =>
+        collectSimilarExpressions(left, table) ++ collectSimilarExpressions(right, table)
+      case sources.Or(left, right) => collectSimilarExpressions(left, table) ++
+                              collectSimilarExpressions(right, table)
+      case others => Seq((others, getStorageOrdinal(others, table)))
+    }
+  }
+
+  /**
+   * This method will reorder the filter based on the Storage Ordinal of the column references.
+   *
+   * Example1:
+   *             And                                   And
+   *      Or          And             =>        Or            And
+   *  col3  col1  col2  col1                col1  col3    col1   col2
+   *
+   *  **Mixed expression filter reordered locally, but wont be reordered globally.**
+   *
+   * Example2:
+   *             And                                   And
+   *      And          And           =>       And            And
+   *  col3  col1  col2  col1                col1  col1    col2   col3
+   *
+   *             Or                                    Or
+   *       Or          Or             =>        Or            Or
+   *   col3  col1  col2  col1               col1  col1    col2   col3
+   *
+   *  **Similar expression filters are reordered globally**
+   *
+   * @param filter the filter expression to be reordered
+   * @return The reordered filter with the current ordinal
+   */
+  def reorderFilter(filter: Filter, table: CarbonTable): (Filter, Int) = {
+    val filterMap = mutable.HashMap[String, List[(Filter, Int)]]()
+      def sortFilter(filter: Filter): (Filter, Int) = {

Review comment:
       Agree with david. I too felt it is not easy to read

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala
##########
@@ -72,7 +72,9 @@ case class CarbonDatasourceHadoopRelation(
       projects: Seq[NamedExpression],
       filters: Array[Filter],
       partitions: Seq[PartitionSpec]): RDD[InternalRow] = {
-    val filterExpression: Option[Expression] = filters.flatMap { filter =>
+    val reorderedFilter = filters.map(CarbonFilters.reorderFilter(_, carbonTable)).sortBy(_._2)

Review comment:
       better to do this at down layer (common code) so that presto and hive queries also can make use of this ?

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
##########
@@ -373,6 +375,146 @@ object CarbonFilters {
     val carbonTable = CarbonEnv.getCarbonTable(identifier)(sparkSession)
     getPartitions(partitionFilters, sparkSession, carbonTable)
   }
+
+  def getStorageOrdinal(filter: Filter, carbonTable: CarbonTable): Int = {
+    val column = filter.references.map(carbonTable.getColumnByName)
+    if (column.isEmpty) {
+      -1
+    } else {
+      if (column.head.isDimension) {
+        column.head.getOrdinal
+      } else {
+        column.head.getOrdinal + carbonTable.getAllDimensions.size()
+      }
+    }
+  }
+
+  def collectSimilarExpressions(filter: Filter, table: CarbonTable): Seq[(Filter, Int)] = {
+    filter match {
+      case sources.And(left, right) =>
+        collectSimilarExpressions(left, table) ++ collectSimilarExpressions(right, table)
+      case sources.Or(left, right) => collectSimilarExpressions(left, table) ++
+                              collectSimilarExpressions(right, table)
+      case others => Seq((others, getStorageOrdinal(others, table)))
+    }
+  }
+
+  /**
+   * This method will reorder the filter based on the Storage Ordinal of the column references.

Review comment:
       In case of multiple AND filters, we need to first read the column that gives less output, so that this output can be sent to other filter by bitset pipeline. So, always changing to storage order may not be good idea. [because somecase the user can know the cardinality and provide efficient filter order]
   
   @kumarvishal09, @QiangCai, @ravipesala   : what is your opinion on this ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-685310097


   @kunal : Also please add some report in the description on how much was the performance with and without this change.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

ajantha-bhat edited a comment on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-685310097


   @kunal642  : Also please add some report in the description on how much was the performance with and without this change.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#discussion_r481737995



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
##########
@@ -373,6 +375,146 @@ object CarbonFilters {
     val carbonTable = CarbonEnv.getCarbonTable(identifier)(sparkSession)
     getPartitions(partitionFilters, sparkSession, carbonTable)
   }
+
+  def getStorageOrdinal(filter: Filter, carbonTable: CarbonTable): Int = {
+    val column = filter.references.map(carbonTable.getColumnByName)
+    if (column.isEmpty) {
+      -1
+    } else {
+      if (column.head.isDimension) {
+        column.head.getOrdinal
+      } else {
+        column.head.getOrdinal + carbonTable.getAllDimensions.size()
+      }
+    }
+  }
+
+  def collectSimilarExpressions(filter: Filter, table: CarbonTable): Seq[(Filter, Int)] = {
+    filter match {
+      case sources.And(left, right) =>
+        collectSimilarExpressions(left, table) ++ collectSimilarExpressions(right, table)
+      case sources.Or(left, right) => collectSimilarExpressions(left, table) ++
+                              collectSimilarExpressions(right, table)
+      case others => Seq((others, getStorageOrdinal(others, table)))
+    }
+  }
+
+  /**
+   * This method will reorder the filter based on the Storage Ordinal of the column references.

Review comment:
       I feel if user specify that don't change filter order (because he knows the cardinality) by a carbon property. we should not change the order. Else we can change !




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


123