[GitHub] [carbondata] ajantha-bhat opened a new pull request #3771: [WIP] pushdown array_contains filter to carbon

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox

ajantha-bhat opened a new pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771


    ### Why is this PR needed?
   
   
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-632298941


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3050/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-632299427


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1330/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-632597680


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1334/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-632599089


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3054/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-632972441


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3055/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-632972849


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1335/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r429519286



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestCompactionComplexType.scala
##########
@@ -47,6 +47,33 @@ class TestCompactionComplexType extends QueryTest with BeforeAndAfterAll {
     sql("DROP TABLE IF EXISTS compactComplex")
   }
 
+  test("complex issue") {
+    sql("drop table if exists complex1")
+    sql("create table complex1 (arr array<String>) stored as carbondata")
+    sql("insert into complex1 select array('as') union all " +
+        "select array('sd','df','gh') union all " +
+        "select array('rt','ew','rtyu','jk','sder') union all " +
+        "select array('ghsf','dbv','fg','ty') union all " +
+        "select array('hjsd','fggb','nhj','sd','asd')")
+

Review comment:
       Please add test scenario with data as null. array(null)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r429519687



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestCompactionComplexType.scala
##########
@@ -47,6 +47,33 @@ class TestCompactionComplexType extends QueryTest with BeforeAndAfterAll {
     sql("DROP TABLE IF EXISTS compactComplex")
   }
 
+  test("complex issue") {
+    sql("drop table if exists complex1")
+    sql("create table complex1 (arr array<String>) stored as carbondata")
+    sql("insert into complex1 select array('as') union all " +
+        "select array('sd','df','gh') union all " +
+        "select array('rt','ew','rtyu','jk','sder') union all " +
+        "select array('ghsf','dbv','fg','ty') union all " +
+        "select array('hjsd','fggb','nhj','sd','asd')")
+

Review comment:
       This is WIP temp, cannot merge this poc code. Why review?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r429519687



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestCompactionComplexType.scala
##########
@@ -47,6 +47,33 @@ class TestCompactionComplexType extends QueryTest with BeforeAndAfterAll {
     sql("DROP TABLE IF EXISTS compactComplex")
   }
 
+  test("complex issue") {
+    sql("drop table if exists complex1")
+    sql("create table complex1 (arr array<String>) stored as carbondata")
+    sql("insert into complex1 select array('as') union all " +
+        "select array('sd','df','gh') union all " +
+        "select array('rt','ew','rtyu','jk','sder') union all " +
+        "select array('ghsf','dbv','fg','ty') union all " +
+        "select array('hjsd','fggb','nhj','sd','asd')")
+

Review comment:
       This is WIP temp, cannot merge this poc code. Why review?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r429519286



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestCompactionComplexType.scala
##########
@@ -47,6 +47,33 @@ class TestCompactionComplexType extends QueryTest with BeforeAndAfterAll {
     sql("DROP TABLE IF EXISTS compactComplex")
   }
 
+  test("complex issue") {
+    sql("drop table if exists complex1")
+    sql("create table complex1 (arr array<String>) stored as carbondata")
+    sql("insert into complex1 select array('as') union all " +
+        "select array('sd','df','gh') union all " +
+        "select array('rt','ew','rtyu','jk','sder') union all " +
+        "select array('ghsf','dbv','fg','ty') union all " +
+        "select array('hjsd','fggb','nhj','sd','asd')")
+

Review comment:
       Please add test scenario with data as null. array(null)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r429538025



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java
##########
@@ -67,4 +67,18 @@ private DimensionColumnPage getDecodedDimensionPage(DimensionColumnPage[][] dime
     }
     return dimensionColumnPages[columnIndex][pageNumber];
   }
+
+  /**
+   * Method will copy the block chunk holder data and return the cloned value.
+   * This method is also used by child.
+   */
+  protected byte[] copyBlockDataChunkWithoutClone(DimensionRawColumnChunk[] rawColumnChunks,
+      DimensionColumnPage[][] dimensionColumnPages, int rowNumber, int pageNumber) {
+    byte[] data =
+        getDecodedDimensionPage(dimensionColumnPages, rawColumnChunks[columnIndex], pageNumber)

Review comment:
       how about to cache the page, it will not require to decode for each row again.

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +224,90 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
       }
     }
     BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-    for (int i = 0; i < pageNumbers; i++) {
-      BitSet set = new BitSet(numberOfRows[i]);
-      RowIntf row = new RowImpl();
-      BitSet prvBitset = null;
-      // if bitset pipe line is enabled then use rowid from previous bitset
-      // otherwise use older flow
-      if (!useBitsetPipeLine ||
-          null == rawBlockletColumnChunks.getBitSetGroup() ||
-          null == bitSetGroup.getBitSet(i) ||
-          rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
-        for (int index = 0; index < numberOfRows[i]; index++) {
-          createRow(rawBlockletColumnChunks, row, i, index);
-          Boolean rslt = false;
-          try {
-            rslt = exp.evaluate(row).getBoolean();
-          }
-          // Any invalid member while evaluation shall be ignored, system will log the
-          // error only once since all rows the evaluation happens so inorder to avoid
-          // too much log inforation only once the log will be printed.
-          catch (FilterIllegalMemberException e) {
-            FilterUtil.logError(e, false);
-          }
-          if (null != rslt && rslt) {
-            set.set(index);
+
+    if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]) {
+      // fill default value here
+      DimColumnResolvedFilterInfo dimColumnEvaluatorInfo = dimColEvaluatorInfoList.get(0);
+      // if filter dimension is not present in the current add its default value
+      if (dimColumnEvaluatorInfo.getDimension().getDataType().isComplexType()) {
+        for (int i = 0; i < pageNumbers; i++) {
+          BitSet set = new BitSet(numberOfRows[i]);
+          RowIntf row = new RowImpl();
+          for (int index = 0; index < numberOfRows[i]; index++) {
+            ArrayQueryType complexType =
+                (ArrayQueryType) complexDimensionInfoMap.get(dimensionChunkIndex[i]);
+            int[] numberOfChild = complexType
+                .getNumberOfChild(rawBlockletColumnChunks.getDimensionRawColumnChunks(), null,

Review comment:
       how about to get all numbers of the child once

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +224,90 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
       }
     }
     BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-    for (int i = 0; i < pageNumbers; i++) {
-      BitSet set = new BitSet(numberOfRows[i]);
-      RowIntf row = new RowImpl();
-      BitSet prvBitset = null;
-      // if bitset pipe line is enabled then use rowid from previous bitset
-      // otherwise use older flow
-      if (!useBitsetPipeLine ||
-          null == rawBlockletColumnChunks.getBitSetGroup() ||
-          null == bitSetGroup.getBitSet(i) ||
-          rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
-        for (int index = 0; index < numberOfRows[i]; index++) {
-          createRow(rawBlockletColumnChunks, row, i, index);
-          Boolean rslt = false;
-          try {
-            rslt = exp.evaluate(row).getBoolean();
-          }
-          // Any invalid member while evaluation shall be ignored, system will log the
-          // error only once since all rows the evaluation happens so inorder to avoid
-          // too much log inforation only once the log will be printed.
-          catch (FilterIllegalMemberException e) {
-            FilterUtil.logError(e, false);
-          }
-          if (null != rslt && rslt) {
-            set.set(index);
+
+    if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]) {
+      // fill default value here
+      DimColumnResolvedFilterInfo dimColumnEvaluatorInfo = dimColEvaluatorInfoList.get(0);
+      // if filter dimension is not present in the current add its default value
+      if (dimColumnEvaluatorInfo.getDimension().getDataType().isComplexType()) {
+        for (int i = 0; i < pageNumbers; i++) {
+          BitSet set = new BitSet(numberOfRows[i]);
+          RowIntf row = new RowImpl();
+          for (int index = 0; index < numberOfRows[i]; index++) {
+            ArrayQueryType complexType =

Review comment:
       move to the outside of for loop




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

QiangCai commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-633044918


   if the query has only one simple filter(without and/or), maybe we can try to push down "limit" to filter.
   So the filter will not require to read all values of all rows.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-633063889


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1336/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-633064918


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3056/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-634561752


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1353/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-634563208


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3074/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r435034670



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java
##########
@@ -67,4 +67,18 @@ private DimensionColumnPage getDecodedDimensionPage(DimensionColumnPage[][] dime
     }
     return dimensionColumnPages[columnIndex][pageNumber];
   }
+
+  /**
+   * Method will copy the block chunk holder data and return the cloned value.
+   * This method is also used by child.
+   */
+  protected byte[] copyBlockDataChunkWithoutClone(DimensionRawColumnChunk[] rawColumnChunks,
+      DimensionColumnPage[][] dimensionColumnPages, int rowNumber, int pageNumber) {
+    byte[] data =
+        getDecodedDimensionPage(dimensionColumnPages, rawColumnChunks[columnIndex], pageNumber)

Review comment:
       I have debugged, cache is already there. the argument of this method, `DimensionColumnPage[][] dimensionColumnPages` itself is a cache based on column index.
   
   go inside `ComplexQueryType#getDecodedDimensionPage` to see it.
   
   Also observed that only once decodeColumnPage called for that page, reset it is using from cache only.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r435036918



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java
##########
@@ -67,4 +67,18 @@ private DimensionColumnPage getDecodedDimensionPage(DimensionColumnPage[][] dime
     }
     return dimensionColumnPages[columnIndex][pageNumber];
   }
+
+  /**
+   * Method will copy the block chunk holder data and return the cloned value.
+   * This method is also used by child.
+   */
+  protected byte[] copyBlockDataChunkWithoutClone(DimensionRawColumnChunk[] rawColumnChunks,
+      DimensionColumnPage[][] dimensionColumnPages, int rowNumber, int pageNumber) {
+    byte[] data =
+        getDecodedDimensionPage(dimensionColumnPages, rawColumnChunks[columnIndex], pageNumber)

Review comment:
       In BlockletScannedResult, dimensionColumnPages[][]




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-638649864


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1408/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


123