Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creat...

Classic

List

50 messages Options

Options

123

[GitHub] carbondata pull request #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creat...

GitHub user geetikagupta16 opened a pull request:

https://github.com/apache/carbondata/pull/1724

[CARBONDATA-1940][PreAgg] Fixed bug for creation of preaggregate table with group by clause

1. Refactored code to create pre-aggregate table with group by clause
2. Added related test cases

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/geetikagupta16/incubator-carbondata CARBONDATA-1940

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1724.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1724

----
commit d23a9afe11938a270863b07087f2551f06d6c2de
Author: Geetika Gupta <geetika.gupta@...>
Date: 2017-12-26T11:59:38Z

1. Refactored code to create preaggregate table with group by clause
2. Added related test cases

----

---

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1724

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2331/

---

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1724

Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1118/

---

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1724

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2560/

---

[GitHub] carbondata pull request #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creat...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1724#discussion_r158718033

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala ---
@@ -144,6 +144,16 @@ object PreAggregateUtil {
throw new MalformedCarbonCommandException(s"Unsupported Select Statement:${
selectStmt } ")
}
+ groupByExp map {
+ case attr: AttributeReference =>
+ fieldToDataMapFieldMap += getField(attr.name,
+ attr.dataType,
+ parentColumnId = carbonTable.getColumnByName(parentTableName, attr.name).getColumnId,
+ parentTableName = parentTableName,
+ parentDatabaseName = parentDatabaseName, parentTableId = parentTableId)
+ case _ =>
+ throw new MalformedCarbonCommandException(s"Expression in group by clause is incorrect")
--- End diff --

Please update the exception message
throw new MalformedCarbonCommandException(s"Unsupported Function in select Statement:${
selectStmt }

---

[GitHub] carbondata pull request #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creat...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1724#discussion_r158718262

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala ---
@@ -1264,13 +1264,16 @@ object CarbonPreAggregateDataLoadingRules extends Rule[LogicalPlan] {
override def apply(plan: LogicalPlan): LogicalPlan = {
val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression]
plan transform {
- case aggregate@Aggregate(_, aExp, _) if validateAggregateExpressions(aExp) =>
+ case aggregate@Aggregate(groupingExpressions, aExp, _) if validateAggregateExpressions(aExp) =>
aExp.foreach {
case alias: Alias =>
validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias)
case _: UnresolvedAlias =>
case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr)
}
+ groupingExpressions foreach {
--- End diff --

I don't get the purpose of this change data loading rules is only to update the average column as average columns data will be stored in two column(sum, count), can u please explain why we need this change??

---

[GitHub] carbondata pull request #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creat...

In reply to this post by qiuchenjian-2

Github user geetikagupta16 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1724#discussion_r158765059

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala ---
@@ -1264,13 +1264,16 @@ object CarbonPreAggregateDataLoadingRules extends Rule[LogicalPlan] {
override def apply(plan: LogicalPlan): LogicalPlan = {
val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression]
plan transform {
- case aggregate@Aggregate(_, aExp, _) if validateAggregateExpressions(aExp) =>
+ case aggregate@Aggregate(groupingExpressions, aExp, _) if validateAggregateExpressions(aExp) =>
aExp.foreach {
case alias: Alias =>
validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias)
case _: UnresolvedAlias =>
case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr)
}
+ groupingExpressions foreach {
--- End diff --

This change is required for adding grouping expressions in logical plan for data loading in pre aggregate table as now expressions in group clause will also be treated as columns of pre aggregate table

---

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1724

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2345/

---

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1724

Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1129/

---

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1724

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2564/

---

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

In reply to this post by qiuchenjian-2

Github user geetikagupta16 commented on the issue:

https://github.com/apache/carbondata/pull/1724

retest this please

---

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1724

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2353/

---

[GitHub] carbondata pull request #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creat...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1724#discussion_r158803200

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala ---
@@ -1264,13 +1264,16 @@ object CarbonPreAggregateDataLoadingRules extends Rule[LogicalPlan] {
override def apply(plan: LogicalPlan): LogicalPlan = {
val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression]
plan transform {
- case aggregate@Aggregate(_, aExp, _) if validateAggregateExpressions(aExp) =>
+ case aggregate@Aggregate(groupingExpressions, aExp, _) if validateAggregateExpressions(aExp) =>
aExp.foreach {
case alias: Alias =>
validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias)
case _: UnresolvedAlias =>
case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr)
}
+ groupingExpressions foreach {
--- End diff --

Ok, This is to handle when column is on grouping expression but not on projection, as we are adding column in data map user can create aggregation data map by adding that column in projection list. But it is OK to support this type of expression in pre aggregate .

---

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1724

Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1136/

---

[GitHub] carbondata pull request #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creat...

In reply to this post by qiuchenjian-2

Github user geetikagupta16 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1724#discussion_r158896776

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala ---
@@ -1264,13 +1264,16 @@ object CarbonPreAggregateDataLoadingRules extends Rule[LogicalPlan] {
override def apply(plan: LogicalPlan): LogicalPlan = {
val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression]
plan transform {
- case aggregate@Aggregate(_, aExp, _) if validateAggregateExpressions(aExp) =>
+ case aggregate@Aggregate(groupingExpressions, aExp, _) if validateAggregateExpressions(aExp) =>
aExp.foreach {
case alias: Alias =>
validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias)
case _: UnresolvedAlias =>
case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr)
}
+ groupingExpressions foreach {
--- End diff --

As per CARBONDATA-1940, it was throwing an exception when grouping columns were not added in projection of preaggregate table

---

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

In reply to this post by qiuchenjian-2

Github user BJangir commented on the issue:

https://github.com/apache/carbondata/pull/1724

@geetikagupta16 , please handle filter columns also . for example . select country ,sum( case when year=2010 salary else 0 end) from tbl where year=2010 or year=2011 group by country. in this case aggregate datamap should have country,year ,salary columns .

---

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/1724

@BJangir Currently we are not supporting filter while creating the aggregation data map. In future we may support. User can add filter column on projection for above use case.

---

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/1724

retest this please

---

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1724

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1327/

---

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1724

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2563/

---

123