Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] VenuReddy2103 opened a new pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

Classic

List

29 messages Options

Options

12

GitBox

[GitHub] [carbondata] VenuReddy2103 opened a new pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

VenuReddy2103 opened a new pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953

### Why is this PR needed?
IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is set to true. `RowLevelFilterExecutorImpl.applyFilter()` calls `createRow()` and applies filter at row level with `expression.evaluate(row)` invocation. `expression` in this case is `InExpression`. But `createRow()` missed to fill the date column value. Thus expression always evaluates to false and returns 0 rows.

### What changes were proposed in this PR?
1. Filled the date column value in `createRow()`.
2. Have removed unused imports introduced in previous PR.(SqlAstBuilderHelper.scala)

### Does this PR introduce any user interface change?
- No

### Is any new testcase added?
- Yes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

CarbonDataQA1 commented on pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#issuecomment-698041869

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2456/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#issuecomment-698045079

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4199/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

VenuReddy2103 commented on pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#issuecomment-698100642

retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] kunal642 commented on a change in pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

kunal642 commented on a change in pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#discussion_r494046607

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecutorImpl.java
##########
@@ -138,6 +145,8 @@ public RowLevelFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvalua
this.exp = exp;
this.tableIdentifier = tableIdentifier;
this.complexDimensionInfoMap = complexDimensionInfoMap;
+ this.dateDictionaryGenerator =
+ DirectDictionaryKeyGeneratorFactory.getDirectDictionaryGenerator(DataTypes.DATE);

Review comment:
this issue is not happening with timestamp column?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] kunal642 commented on a change in pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

kunal642 commented on a change in pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#discussion_r494047162

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/filterexpr/TestInFilter.scala
##########
@@ -165,8 +168,27 @@ class TestInFilter extends QueryTest with BeforeAndAfterAll{
Seq(Row(4, 1.00, 2.00, 3.00)))
}

- override def afterAll(): Unit = {
+ test("test infilter with date, timestamp columns") {
+ sql("create table test_table(i int, dt date, ts timestamp) stored as carbondata")
+ sql("insert into test_table select 1, '2020-03-30', '2020-03-30 10:00:00'")

Review comment:
drop table before create to avoid failure if previous test fails

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#discussion_r494050860

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/filterexpr/TestInFilter.scala
##########
@@ -165,8 +168,27 @@ class TestInFilter extends QueryTest with BeforeAndAfterAll{
Seq(Row(4, 1.00, 2.00, 3.00)))
}

- override def afterAll(): Unit = {
+ test("test infilter with date, timestamp columns") {
+ sql("create table test_table(i int, dt date, ts timestamp) stored as carbondata")
+ sql("insert into test_table select 1, '2020-03-30', '2020-03-30 10:00:00'")

Review comment:
Have already added drop table in afterEach()

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#issuecomment-698136460

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2458/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#issuecomment-698137893

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4201/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] QiangCai commented on a change in pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

QiangCai commented on a change in pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#discussion_r494090761

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecutorImpl.java
##########
@@ -106,6 +108,11 @@
*/
boolean isNaturalSorted;

+ /**
+ * date direct dictionary generator
+ */
+ private DirectDictionaryGenerator dateDictionaryGenerator;
+
public RowLevelFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvaluatorInfoList,

Review comment:
better to use IncludeFilterExecutorImpl as much as possible.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] QiangCai commented on a change in pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

QiangCai commented on a change in pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#discussion_r494090761

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecutorImpl.java
##########
@@ -106,6 +108,11 @@
*/
boolean isNaturalSorted;

+ /**
+ * date direct dictionary generator
+ */
+ private DirectDictionaryGenerator dateDictionaryGenerator;
+
public RowLevelFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvaluatorInfoList,

Review comment:
better to use IncludeFilterExecutorImpl for in expression as much as possible.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#discussion_r494091568

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecutorImpl.java
##########
@@ -138,6 +145,8 @@ public RowLevelFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvalua
this.exp = exp;
this.tableIdentifier = tableIdentifier;
this.complexDimensionInfoMap = complexDimensionInfoMap;
+ this.dateDictionaryGenerator =
+ DirectDictionaryKeyGeneratorFactory.getDirectDictionaryGenerator(DataTypes.DATE);

Review comment:
Yeah not happening for timestamp. But have added testcase for timestamp as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#discussion_r494091568

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecutorImpl.java
##########
@@ -138,6 +145,8 @@ public RowLevelFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvalua
this.exp = exp;
this.tableIdentifier = tableIdentifier;
this.complexDimensionInfoMap = complexDimensionInfoMap;
+ this.dateDictionaryGenerator =
+ DirectDictionaryKeyGeneratorFactory.getDirectDictionaryGenerator(DataTypes.DATE);

Review comment:
Yeah not happening for timestamp. But have added testcase for infilter with timestamp as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] kunal642 commented on a change in pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

kunal642 commented on a change in pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#discussion_r494046607

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecutorImpl.java
##########
@@ -138,6 +145,8 @@ public RowLevelFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvalua
this.exp = exp;
this.tableIdentifier = tableIdentifier;
this.complexDimensionInfoMap = complexDimensionInfoMap;
+ this.dateDictionaryGenerator =
+ DirectDictionaryKeyGeneratorFactory.getDirectDictionaryGenerator(DataTypes.DATE);

Review comment:
this issue is not happening with timestamp column?

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/filterexpr/TestInFilter.scala
##########
@@ -165,8 +168,27 @@ class TestInFilter extends QueryTest with BeforeAndAfterAll{
Seq(Row(4, 1.00, 2.00, 3.00)))
}

- override def afterAll(): Unit = {
+ test("test infilter with date, timestamp columns") {
+ sql("create table test_table(i int, dt date, ts timestamp) stored as carbondata")
+ sql("insert into test_table select 1, '2020-03-30', '2020-03-30 10:00:00'")

Review comment:
drop table before create to avoid failure if previous test fails

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] QiangCai commented on a change in pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

QiangCai commented on a change in pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#discussion_r494090761

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecutorImpl.java
##########
@@ -106,6 +108,11 @@
*/
boolean isNaturalSorted;

+ /**
+ * date direct dictionary generator
+ */
+ private DirectDictionaryGenerator dateDictionaryGenerator;
+
public RowLevelFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvaluatorInfoList,

Review comment:
better to use IncludeFilterExecutorImpl as much as possible.

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecutorImpl.java
##########
@@ -106,6 +108,11 @@
*/
boolean isNaturalSorted;

+ /**
+ * date direct dictionary generator
+ */
+ private DirectDictionaryGenerator dateDictionaryGenerator;
+
public RowLevelFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvaluatorInfoList,

Review comment:
better to use IncludeFilterExecutorImpl for in expression as much as possible.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#issuecomment-698136460

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

VenuReddy2103 commented on pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#issuecomment-698100642

retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#discussion_r494050860

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/filterexpr/TestInFilter.scala
##########
@@ -165,8 +168,27 @@ class TestInFilter extends QueryTest with BeforeAndAfterAll{
Seq(Row(4, 1.00, 2.00, 3.00)))
}

- override def afterAll(): Unit = {
+ test("test infilter with date, timestamp columns") {
+ sql("create table test_table(i int, dt date, ts timestamp) stored as carbondata")
+ sql("insert into test_table select 1, '2020-03-30', '2020-03-30 10:00:00'")

Review comment:
Have already added drop table in afterEach()

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecutorImpl.java
##########
@@ -138,6 +145,8 @@ public RowLevelFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvalua
this.exp = exp;
this.tableIdentifier = tableIdentifier;
this.complexDimensionInfoMap = complexDimensionInfoMap;
+ this.dateDictionaryGenerator =
+ DirectDictionaryKeyGeneratorFactory.getDirectDictionaryGenerator(DataTypes.DATE);

Review comment:
Yeah not happening for timestamp. But have added testcase for timestamp as well.

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecutorImpl.java
##########
@@ -138,6 +145,8 @@ public RowLevelFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvalua
this.exp = exp;
this.tableIdentifier = tableIdentifier;
this.complexDimensionInfoMap = complexDimensionInfoMap;
+ this.dateDictionaryGenerator =
+ DirectDictionaryKeyGeneratorFactory.getDirectDictionaryGenerator(DataTypes.DATE);

Review comment:
Yeah not happening for timestamp. But have added testcase for infilter with timestamp as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#discussion_r496212659

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecutorImpl.java
##########
@@ -106,6 +108,11 @@
*/
boolean isNaturalSorted;

+ /**
+ * date direct dictionary generator
+ */
+ private DirectDictionaryGenerator dateDictionaryGenerator;
+
public RowLevelFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvaluatorInfoList,

Review comment:
Have checked this. We are using IncludeFilterExecutorImpl as long as their is no cast expression involved. Cast operation is involved only when the data type of column to compare is not same as the literal values it is compared with.
<p>Each literal value in the InFilter can be of different type. Spark chooses one datatype for the entire list of literal values.
For example,
i IN (3, 2, 1.0) -> all values treated as decimal
i IN (3, 2.0, '1.0') -> all values are treated as string. Note: 2.0 is not same as 2 when casted as string.
<p> When the cast operation is present, `SparkUnknownExpression` with spark cast(`SparkExpression`) is used. `SparkUnknownExpression.evaluate()` takes row's column value. Using spark cast expression, validates it and converts it to datatype of literal value it chose. And then compare the converted column value against literal values. So validation and type conversions are handled with spark cast expression.
IMO, IncludeFilterExecutorImpl is already used when possible.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3953: [CARBONDATA-4008]Fixed IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #3953:
URL: https://github.com/apache/carbondata/pull/3953#discussion_r496212659

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecutorImpl.java
##########
@@ -106,6 +108,11 @@
*/
boolean isNaturalSorted;

+ /**
+ * date direct dictionary generator
+ */
+ private DirectDictionaryGenerator dateDictionaryGenerator;
+
public RowLevelFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvaluatorInfoList,

Review comment:
Have checked this. We are using IncludeFilterExecutorImpl as long as their is no cast expression involved. Cast is involved only when the data type of column to compare is not same as the literal values it is compared with.
<p>Each literal value in the InFilter can be of different type. Spark chooses one datatype for the entire list of literal values.
For example, with i as int column
i IN (3, 2, 1.0) -> all values treated as decimal
i IN (3, 2.0, '1.0') -> all values are treated as string. Note: 2.0 is not same as 2 when casted as string.
<p> When the cast operation is present, `SparkUnknownExpression` with spark cast(`SparkExpression`) is used. `SparkUnknownExpression.evaluate()` takes row's column value. Using spark cast expression, validates it and converts it to datatype of literal value it chose. And then compare the converted column value against literal values. So validation and type conversions are handled with spark cast expression.
IMO, IncludeFilterExecutorImpl is already used when possible.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

12