[GitHub] [carbondata] ajantha-bhat opened a new pull request #3771: [WIP] pushdown array_contains filter to carbon

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-649431612


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1490/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r447457038



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -865,6 +869,27 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
         Some(CarbonContainsWith(c))
       case c@Literal(v, t) if (v == null) =>
         Some(FalseExpr())
+      case c@ArrayContains(a: Attribute, Literal(v, t)) =>
+        a.dataType match {
+          case arrayType: ArrayType =>
+            arrayType.elementType match {
+              case StringType => Some(sources.EqualTo(a.name, v))

Review comment:
       I want reuse existing equalsTo code, I don't see any advantage of making new expression

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
##########
@@ -152,13 +152,25 @@ object CarbonFilters {
     }
 
     def getCarbonExpression(name: String) = {

Review comment:
       I want reuse existing equalsTo code, I don't see any advantage of making new expression




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-658610416


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-658682568


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3392/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-658688900


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1652/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r456167613



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestArrayContainsPushDown.scala
##########
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.integration.spark.testsuite.complexType
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class TestArrayContainsPushDown extends QueryTest with BeforeAndAfterAll {
+
+  override protected def afterAll(): Unit = {
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
+        CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT)
+    sql("DROP TABLE IF EXISTS compactComplex")
+  }
+
+  test("test array contains pushdown for array of string") {
+    sql("drop table if exists complex1")
+    sql("create table complex1 (arr array<String>) stored as carbondata")
+    sql("insert into complex1 select array('as') union all " +
+        "select array('sd','df','gh') union all " +
+        "select array('rt','ew','rtyu','jk',null) union all " +
+        "select array('ghsf','dbv','','ty') union all " +
+        "select array('hjsd','fggb','nhj','sd','asd')")
+
+    checkExistence(sql(" explain select * from complex1 where array_contains(arr,'sd')"),
+      true,
+      "PushedFilters: [*EqualTo(arr,sd)]")
+
+    checkExistence(sql(" explain select count(*) from complex1 where array_contains(arr,'sd')"),
+      true,
+      "PushedFilters: [*EqualTo(arr,sd)]")
+
+    checkAnswer(sql(" select * from complex1 where array_contains(arr,'sd')"),

Review comment:
       can you add a test case that likes the below query?
   
   select * from complex1 where arr[0] = 'sd'
   
   can we push down this filter too?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r456175101



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -865,7 +870,33 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
         Some(CarbonContainsWith(c))
       case c@Literal(v, t) if (v == null) =>
         Some(FalseExpr())
-      case others => None
+      case c@ArrayContains(a: Attribute, Literal(v, t)) =>
+        a.dataType match {
+          case arrayType: ArrayType =>
+            arrayType.elementType match {

Review comment:
       how about extract the match code block to a method: isPrimitiveDataType and move it into a util class?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r456176045



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +228,103 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
       }
     }
     BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-    for (int i = 0; i < pageNumbers; i++) {
-      BitSet set = new BitSet(numberOfRows[i]);
-      RowIntf row = new RowImpl();
-      BitSet prvBitset = null;
-      // if bitset pipe line is enabled then use rowid from previous bitset
-      // otherwise use older flow
-      if (!useBitsetPipeLine ||
-          null == rawBlockletColumnChunks.getBitSetGroup() ||
-          null == bitSetGroup.getBitSet(i) ||
-          rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
+    if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]

Review comment:
       it will be hard to read the code after we add more if condition




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r457070775



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -865,7 +870,33 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
         Some(CarbonContainsWith(c))
       case c@Literal(v, t) if (v == null) =>
         Some(FalseExpr())
-      case others => None
+      case c@ArrayContains(a: Attribute, Literal(v, t)) =>
+        a.dataType match {
+          case arrayType: ArrayType =>
+            arrayType.elementType match {

Review comment:
       ok. moved




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r457075708



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestArrayContainsPushDown.scala
##########
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.integration.spark.testsuite.complexType
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class TestArrayContainsPushDown extends QueryTest with BeforeAndAfterAll {
+
+  override protected def afterAll(): Unit = {
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
+        CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT)
+    sql("DROP TABLE IF EXISTS compactComplex")
+  }
+
+  test("test array contains pushdown for array of string") {
+    sql("drop table if exists complex1")
+    sql("create table complex1 (arr array<String>) stored as carbondata")
+    sql("insert into complex1 select array('as') union all " +
+        "select array('sd','df','gh') union all " +
+        "select array('rt','ew','rtyu','jk',null) union all " +
+        "select array('ghsf','dbv','','ty') union all " +
+        "select array('hjsd','fggb','nhj','sd','asd')")
+
+    checkExistence(sql(" explain select * from complex1 where array_contains(arr,'sd')"),
+      true,
+      "PushedFilters: [*EqualTo(arr,sd)]")
+
+    checkExistence(sql(" explain select count(*) from complex1 where array_contains(arr,'sd')"),
+      true,
+      "PushedFilters: [*EqualTo(arr,sd)]")
+
+    checkAnswer(sql(" select * from complex1 where array_contains(arr,'sd')"),

Review comment:
       Currently carbon doesn't support pushdown of arr[0] = 'sd', because this pushdown is based on array index.
   Need a separate handling for this. yet to analyze the changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r457075875



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestArrayContainsPushDown.scala
##########
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.integration.spark.testsuite.complexType
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class TestArrayContainsPushDown extends QueryTest with BeforeAndAfterAll {
+
+  override protected def afterAll(): Unit = {
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
+        CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT)
+    sql("DROP TABLE IF EXISTS compactComplex")
+  }
+
+  test("test array contains pushdown for array of string") {
+    sql("drop table if exists complex1")
+    sql("create table complex1 (arr array<String>) stored as carbondata")
+    sql("insert into complex1 select array('as') union all " +
+        "select array('sd','df','gh') union all " +
+        "select array('rt','ew','rtyu','jk',null) union all " +
+        "select array('ghsf','dbv','','ty') union all " +
+        "select array('hjsd','fggb','nhj','sd','asd')")
+
+    checkExistence(sql(" explain select * from complex1 where array_contains(arr,'sd')"),
+      true,
+      "PushedFilters: [*EqualTo(arr,sd)]")
+
+    checkExistence(sql(" explain select count(*) from complex1 where array_contains(arr,'sd')"),
+      true,
+      "PushedFilters: [*EqualTo(arr,sd)]")
+
+    checkAnswer(sql(" select * from complex1 where array_contains(arr,'sd')"),

Review comment:
       This PR is only for UDF pushdown




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-660848087


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1690/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-660848132


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3432/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-661117575


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3440/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-661122494


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1698/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

QiangCai commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-661559605


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

GitBox
In reply to this post by GitBox

asfgit closed pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


123