[GitHub] carbondata pull request #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creat...

classic Classic list List threaded Threaded
50 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creat...

qiuchenjian-2
GitHub user geetikagupta16 opened a pull request:

    https://github.com/apache/carbondata/pull/1724

    [CARBONDATA-1940][PreAgg] Fixed bug for creation of preaggregate table with group by clause

    1. Refactored code to create pre-aggregate table with group by clause
    2. Added related test cases
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/geetikagupta16/incubator-carbondata CARBONDATA-1940

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1724.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1724
   
----
commit d23a9afe11938a270863b07087f2551f06d6c2de
Author: Geetika Gupta <geetika.gupta@...>
Date:   2017-12-26T11:59:38Z

    1. Refactored code to create preaggregate table with group by clause
    2. Added related test cases

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1724
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2331/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1724
 
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1118/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1724
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2560/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1724#discussion_r158718033
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala ---
    @@ -144,6 +144,16 @@ object PreAggregateUtil {
             throw new MalformedCarbonCommandException(s"Unsupported Select Statement:${
               selectStmt } ")
         }
    +    groupByExp map {
    +      case attr: AttributeReference =>
    +        fieldToDataMapFieldMap += getField(attr.name,
    +          attr.dataType,
    +          parentColumnId = carbonTable.getColumnByName(parentTableName, attr.name).getColumnId,
    +          parentTableName = parentTableName,
    +          parentDatabaseName = parentDatabaseName, parentTableId = parentTableId)
    +      case _ =>
    +        throw new MalformedCarbonCommandException(s"Expression in group by clause is incorrect")
    --- End diff --
   
    Please update the exception message
    throw new MalformedCarbonCommandException(s"Unsupported Function in select Statement:${
              selectStmt }


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1724#discussion_r158718262
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala ---
    @@ -1264,13 +1264,16 @@ object CarbonPreAggregateDataLoadingRules extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = {
         val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression]
         plan transform {
    -      case aggregate@Aggregate(_, aExp, _) if validateAggregateExpressions(aExp) =>
    +      case aggregate@Aggregate(groupingExpressions, aExp, _) if validateAggregateExpressions(aExp) =>
             aExp.foreach {
               case alias: Alias =>
                 validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias)
               case _: UnresolvedAlias =>
               case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr)
             }
    +        groupingExpressions foreach {
    --- End diff --
   
    I don't get the purpose of this change data loading rules is only to update the average column as average columns data will be stored in two column(sum, count), can u please explain why we need this change??


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user geetikagupta16 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1724#discussion_r158765059
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala ---
    @@ -1264,13 +1264,16 @@ object CarbonPreAggregateDataLoadingRules extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = {
         val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression]
         plan transform {
    -      case aggregate@Aggregate(_, aExp, _) if validateAggregateExpressions(aExp) =>
    +      case aggregate@Aggregate(groupingExpressions, aExp, _) if validateAggregateExpressions(aExp) =>
             aExp.foreach {
               case alias: Alias =>
                 validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias)
               case _: UnresolvedAlias =>
               case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr)
             }
    +        groupingExpressions foreach {
    --- End diff --
   
    This change is required for adding grouping expressions in logical plan for data loading in pre aggregate table as now expressions in group clause will also be treated as columns of pre aggregate table


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1724
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2345/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1724
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1129/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1724
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2564/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user geetikagupta16 commented on the issue:

    https://github.com/apache/carbondata/pull/1724
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1724
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2353/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1724#discussion_r158803200
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala ---
    @@ -1264,13 +1264,16 @@ object CarbonPreAggregateDataLoadingRules extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = {
         val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression]
         plan transform {
    -      case aggregate@Aggregate(_, aExp, _) if validateAggregateExpressions(aExp) =>
    +      case aggregate@Aggregate(groupingExpressions, aExp, _) if validateAggregateExpressions(aExp) =>
             aExp.foreach {
               case alias: Alias =>
                 validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias)
               case _: UnresolvedAlias =>
               case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr)
             }
    +        groupingExpressions foreach {
    --- End diff --
   
    Ok, This is to handle when column is on grouping expression but not on projection, as we are adding column in data map user can create aggregation data map by adding that column in projection list.  But it is OK to support this type of expression in pre aggregate .


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1724
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1136/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user geetikagupta16 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1724#discussion_r158896776
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala ---
    @@ -1264,13 +1264,16 @@ object CarbonPreAggregateDataLoadingRules extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = {
         val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression]
         plan transform {
    -      case aggregate@Aggregate(_, aExp, _) if validateAggregateExpressions(aExp) =>
    +      case aggregate@Aggregate(groupingExpressions, aExp, _) if validateAggregateExpressions(aExp) =>
             aExp.foreach {
               case alias: Alias =>
                 validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias)
               case _: UnresolvedAlias =>
               case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr)
             }
    +        groupingExpressions foreach {
    --- End diff --
   
    As per CARBONDATA-1940, it was throwing an exception when grouping columns were not added in projection of preaggregate table


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user BJangir commented on the issue:

    https://github.com/apache/carbondata/pull/1724
 
    @geetikagupta16 , please handle filter columns also . for example . select country ,sum( case when year=2010 salary else 0 end)  from tbl where year=2010 or year=2011 group by   country.   in this case aggregate datamap should have country,year ,salary columns .


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/1724
 
    @BJangir Currently we are not supporting filter while creating the aggregation data map. In future we may support. User can add filter column on projection for above use case.



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/1724
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1724
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1327/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1724: [CARBONDATA-1940][PreAgg] Fixed bug for creation of ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1724
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2563/



---
123