Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3771: [WIP] pushdown array_contains filter to carbon

Classic

List

57 messages Options

Options

123

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-638650363

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3132/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r435038194

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +224,90 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
}
}
BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
- for (int i = 0; i < pageNumbers; i++) {
- BitSet set = new BitSet(numberOfRows[i]);
- RowIntf row = new RowImpl();
- BitSet prvBitset = null;
- // if bitset pipe line is enabled then use rowid from previous bitset
- // otherwise use older flow
- if (!useBitsetPipeLine ||
- null == rawBlockletColumnChunks.getBitSetGroup() ||
- null == bitSetGroup.getBitSet(i) ||
- rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
- for (int index = 0; index < numberOfRows[i]; index++) {
- createRow(rawBlockletColumnChunks, row, i, index);
- Boolean rslt = false;
- try {
- rslt = exp.evaluate(row).getBoolean();
- }
- // Any invalid member while evaluation shall be ignored, system will log the
- // error only once since all rows the evaluation happens so inorder to avoid
- // too much log inforation only once the log will be printed.
- catch (FilterIllegalMemberException e) {
- FilterUtil.logError(e, false);
- }
- if (null != rslt && rslt) {
- set.set(index);
+
+ if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]) {
+ // fill default value here
+ DimColumnResolvedFilterInfo dimColumnEvaluatorInfo = dimColEvaluatorInfoList.get(0);
+ // if filter dimension is not present in the current add its default value
+ if (dimColumnEvaluatorInfo.getDimension().getDataType().isComplexType()) {
+ for (int i = 0; i < pageNumbers; i++) {
+ BitSet set = new BitSet(numberOfRows[i]);
+ RowIntf row = new RowImpl();
+ for (int index = 0; index < numberOfRows[i]; index++) {
+ ArrayQueryType complexType =

Review comment:
done

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +224,90 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
}
}
BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
- for (int i = 0; i < pageNumbers; i++) {
- BitSet set = new BitSet(numberOfRows[i]);
- RowIntf row = new RowImpl();
- BitSet prvBitset = null;
- // if bitset pipe line is enabled then use rowid from previous bitset
- // otherwise use older flow
- if (!useBitsetPipeLine ||
- null == rawBlockletColumnChunks.getBitSetGroup() ||
- null == bitSetGroup.getBitSet(i) ||
- rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
- for (int index = 0; index < numberOfRows[i]; index++) {
- createRow(rawBlockletColumnChunks, row, i, index);
- Boolean rslt = false;
- try {
- rslt = exp.evaluate(row).getBoolean();
- }
- // Any invalid member while evaluation shall be ignored, system will log the
- // error only once since all rows the evaluation happens so inorder to avoid
- // too much log inforation only once the log will be printed.
- catch (FilterIllegalMemberException e) {
- FilterUtil.logError(e, false);
- }
- if (null != rslt && rslt) {
- set.set(index);
+
+ if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]) {
+ // fill default value here
+ DimColumnResolvedFilterInfo dimColumnEvaluatorInfo = dimColEvaluatorInfoList.get(0);
+ // if filter dimension is not present in the current add its default value
+ if (dimColumnEvaluatorInfo.getDimension().getDataType().isComplexType()) {
+ for (int i = 0; i < pageNumbers; i++) {
+ BitSet set = new BitSet(numberOfRows[i]);
+ RowIntf row = new RowImpl();
+ for (int index = 0; index < numberOfRows[i]; index++) {
+ ArrayQueryType complexType =
+ (ArrayQueryType) complexDimensionInfoMap.get(dimensionChunkIndex[i]);
+ int[] numberOfChild = complexType
+ .getNumberOfChild(rawBlockletColumnChunks.getDimensionRawColumnChunks(), null,

Review comment:
done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

In reply to this post by GitBox

ajantha-bhat commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-638651321

@QiangCai : We need spark changes support to push down limit to carbonara. So, I think it cannot be done here as we use open source spark.

I want to implement array_contains pushdown for all the primitive type not just string type. I will finish it today.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-638746185

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3133/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-638746696

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1409/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-639416717

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-639547711

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3140/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-639548729

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1416/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

In reply to this post by GitBox

QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r437812719

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +228,103 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
}
}
BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
- for (int i = 0; i < pageNumbers; i++) {
- BitSet set = new BitSet(numberOfRows[i]);
- RowIntf row = new RowImpl();
- BitSet prvBitset = null;
- // if bitset pipe line is enabled then use rowid from previous bitset
- // otherwise use older flow
- if (!useBitsetPipeLine ||
- null == rawBlockletColumnChunks.getBitSetGroup() ||
- null == bitSetGroup.getBitSet(i) ||
- rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
+ if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]

Review comment:
1. better to add new Expression like ArrayContainsExpression
2. how about to consider filter BitSetPipeLine ?

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -679,18 +681,20 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
// In case of ComplexType dataTypes no filters should be pushed down. IsNotNull is being
// explicitly added by spark and pushed. That also has to be handled and pushed back to
// Spark for handling.
- val predicatesWithoutComplex = predicates.filter(predicate =>
+ // allow array_contains() push down
+ val filteredPredicates = predicates.filter(predicate =>

Review comment:
use '{' instead of '('

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -517,7 +518,8 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
val supportBatch =
supportBatchedDataSource(relation.relation.sqlContext,
updateRequestedColumns) && extraRdd.getOrElse((null, true))._2
- if (!vectorPushRowFilters && !supportBatch && !implicitExisted) {
+ if (!vectorPushRowFilters && !supportBatch && !implicitExisted && filterSet.nonEmpty &&

Review comment:
why need to change it?

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
##########
@@ -152,13 +152,25 @@ object CarbonFilters {
}

def getCarbonExpression(name: String) = {

Review comment:
in 'createFilter' method, convert CarbonArrayContains filter to ArrayContainsExpression

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -865,6 +869,27 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
Some(CarbonContainsWith(c))
case c@Literal(v, t) if (v == null) =>
Some(FalseExpr())
+ case c@ArrayContains(a: Attribute, Literal(v, t)) =>
+ a.dataType match {
+ case arrayType: ArrayType =>
+ arrayType.elementType match {
+ case StringType => Some(sources.EqualTo(a.name, v))

Review comment:
how about to use a new filter: CarbonArrayContains

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r437867873

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -517,7 +518,8 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
val supportBatch =
supportBatchedDataSource(relation.relation.sqlContext,
updateRequestedColumns) && extraRdd.getOrElse((null, true))._2
- if (!vectorPushRowFilters && !supportBatch && !implicitExisted) {
+ if (!vectorPushRowFilters && !supportBatch && !implicitExisted && filterSet.nonEmpty &&

Review comment:
This is for count(*) with array_contains() query. Here they were reverting back the array_contains(). so avoided it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

In reply to this post by GitBox

ajantha-bhat commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-642476599

retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-642548570

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3145/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-642548932

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1421/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r444824720

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +228,103 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
}
}
BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
- for (int i = 0; i < pageNumbers; i++) {
- BitSet set = new BitSet(numberOfRows[i]);
- RowIntf row = new RowImpl();
- BitSet prvBitset = null;
- // if bitset pipe line is enabled then use rowid from previous bitset
- // otherwise use older flow
- if (!useBitsetPipeLine ||
- null == rawBlockletColumnChunks.getBitSetGroup() ||
- null == bitSetGroup.getBitSet(i) ||
- rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
+ if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]

Review comment:
@QiangCai : can you please tell me, why new expression is required ? why equalTo is not enough ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r444826666

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -679,18 +681,20 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
// In case of ComplexType dataTypes no filters should be pushed down. IsNotNull is being
// explicitly added by spark and pushed. That also has to be handled and pushed back to
// Spark for handling.
- val predicatesWithoutComplex = predicates.filter(predicate =>
+ // allow array_contains() push down
+ val filteredPredicates = predicates.filter(predicate =>

Review comment:
ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

In reply to this post by GitBox

ajantha-bhat commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-648763485

retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-648837271

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1484/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-648838314

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3211/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r445361657

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +228,103 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
}
}
BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
- for (int i = 0; i < pageNumbers; i++) {
- BitSet set = new BitSet(numberOfRows[i]);
- RowIntf row = new RowImpl();
- BitSet prvBitset = null;
- // if bitset pipe line is enabled then use rowid from previous bitset
- // otherwise use older flow
- if (!useBitsetPipeLine ||
- null == rawBlockletColumnChunks.getBitSetGroup() ||
- null == bitSetGroup.getBitSet(i) ||
- rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
+ if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]

Review comment:
I think using equalTo expression I can reuse most of the code. what do you think ? @QiangCai

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-649430423

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3217/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

123