GitHub user geetikagupta16 opened a pull request:
https://github.com/apache/carbondata/pull/1724 [CARBONDATA-1940][PreAgg] Fixed bug for creation of preaggregate table with group by clause 1. Refactored code to create pre-aggregate table with group by clause 2. Added related test cases Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/geetikagupta16/incubator-carbondata CARBONDATA-1940 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1724.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1724 ---- commit d23a9afe11938a270863b07087f2551f06d6c2de Author: Geetika Gupta <geetika.gupta@...> Date: 2017-12-26T11:59:38Z 1. Refactored code to create preaggregate table with group by clause 2. Added related test cases ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1724 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2331/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1724 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1118/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1724 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2560/ --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1724#discussion_r158718033 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala --- @@ -144,6 +144,16 @@ object PreAggregateUtil { throw new MalformedCarbonCommandException(s"Unsupported Select Statement:${ selectStmt } ") } + groupByExp map { + case attr: AttributeReference => + fieldToDataMapFieldMap += getField(attr.name, + attr.dataType, + parentColumnId = carbonTable.getColumnByName(parentTableName, attr.name).getColumnId, + parentTableName = parentTableName, + parentDatabaseName = parentDatabaseName, parentTableId = parentTableId) + case _ => + throw new MalformedCarbonCommandException(s"Expression in group by clause is incorrect") --- End diff -- Please update the exception message throw new MalformedCarbonCommandException(s"Unsupported Function in select Statement:${ selectStmt } --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1724#discussion_r158718262 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala --- @@ -1264,13 +1264,16 @@ object CarbonPreAggregateDataLoadingRules extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = { val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression] plan transform { - case aggregate@Aggregate(_, aExp, _) if validateAggregateExpressions(aExp) => + case aggregate@Aggregate(groupingExpressions, aExp, _) if validateAggregateExpressions(aExp) => aExp.foreach { case alias: Alias => validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias) case _: UnresolvedAlias => case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr) } + groupingExpressions foreach { --- End diff -- I don't get the purpose of this change data loading rules is only to update the average column as average columns data will be stored in two column(sum, count), can u please explain why we need this change?? --- |
In reply to this post by qiuchenjian-2
Github user geetikagupta16 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1724#discussion_r158765059 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala --- @@ -1264,13 +1264,16 @@ object CarbonPreAggregateDataLoadingRules extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = { val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression] plan transform { - case aggregate@Aggregate(_, aExp, _) if validateAggregateExpressions(aExp) => + case aggregate@Aggregate(groupingExpressions, aExp, _) if validateAggregateExpressions(aExp) => aExp.foreach { case alias: Alias => validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias) case _: UnresolvedAlias => case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr) } + groupingExpressions foreach { --- End diff -- This change is required for adding grouping expressions in logical plan for data loading in pre aggregate table as now expressions in group clause will also be treated as columns of pre aggregate table --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1724 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2345/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1724 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1129/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1724 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2564/ --- |
In reply to this post by qiuchenjian-2
Github user geetikagupta16 commented on the issue:
https://github.com/apache/carbondata/pull/1724 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1724 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2353/ --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1724#discussion_r158803200 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala --- @@ -1264,13 +1264,16 @@ object CarbonPreAggregateDataLoadingRules extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = { val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression] plan transform { - case aggregate@Aggregate(_, aExp, _) if validateAggregateExpressions(aExp) => + case aggregate@Aggregate(groupingExpressions, aExp, _) if validateAggregateExpressions(aExp) => aExp.foreach { case alias: Alias => validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias) case _: UnresolvedAlias => case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr) } + groupingExpressions foreach { --- End diff -- Ok, This is to handle when column is on grouping expression but not on projection, as we are adding column in data map user can create aggregation data map by adding that column in projection list. But it is OK to support this type of expression in pre aggregate . --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1724 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1136/ --- |
In reply to this post by qiuchenjian-2
Github user geetikagupta16 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1724#discussion_r158896776 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala --- @@ -1264,13 +1264,16 @@ object CarbonPreAggregateDataLoadingRules extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = { val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression] plan transform { - case aggregate@Aggregate(_, aExp, _) if validateAggregateExpressions(aExp) => + case aggregate@Aggregate(groupingExpressions, aExp, _) if validateAggregateExpressions(aExp) => aExp.foreach { case alias: Alias => validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias) case _: UnresolvedAlias => case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr) } + groupingExpressions foreach { --- End diff -- As per CARBONDATA-1940, it was throwing an exception when grouping columns were not added in projection of preaggregate table --- |
In reply to this post by qiuchenjian-2
Github user BJangir commented on the issue:
https://github.com/apache/carbondata/pull/1724 @geetikagupta16 , please handle filter columns also . for example . select country ,sum( case when year=2010 salary else 0 end) from tbl where year=2010 or year=2011 group by country. in this case aggregate datamap should have country,year ,salary columns . --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/1724 @BJangir Currently we are not supporting filter while creating the aggregation data map. In future we may support. User can add filter column on projection for above use case. --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/1724 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1724 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1327/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1724 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2563/ --- |
Free forum by Nabble | Edit this page |