[GitHub] [carbondata] QiangCai opened a new pull request #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai opened a new pull request #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue

GitBox
QiangCai opened a new pull request #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681
 
 
    ### Why is this PR needed?
   Spark ReusedExchange rule can't recognition the same Exchange plan on carbon table.
   So the query on the carbon table doesn't reuse Exchange, it leads to bad performance.
   
   For Example:
   
   ```
   create table t1(c1 int, c2 string) using carbondata
   
   explain
   select c2, sum(c1) from t1 group by c2
   union all
   select c2, sum(c1) from t1 group by c2
   ```
   physical plan as following:
   ```
   Union
   :- *(2) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
   : +- Exchange hashpartitioning(c2#37, 200)
   : +- *(1) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))])
   : +- *(1) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct<c1:int,c2:string>
   +- *(4) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
    +- Exchange hashpartitioning(c2#37, 200)
    +- *(3) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))])
    +- *(3) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct<c1:int,c2:string>
   ```
   
   after change, physical plan as following:
   
   ```
   Union
   :- *(2) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
   :  +- Exchange hashpartitioning(c2#37, 200)
   :     +- *(1) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))])
   :        +- *(1) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct<c1:int,c2:string>
   +- *(4) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
      +- ReusedExchange [c2#37, sum#54L], Exchange hashpartitioning(c2#37, 200)
   ```
   
   
    ### What changes were proposed in this PR?
   change CarbonFileIndex class to case class.
   
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - Yes
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue

GitBox
CarbonDataQA1 commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604413810
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2566/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604422173
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/858/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604483281
 
 
   LGTM
   
   good finding. This is applicable only for fileFormat ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue

GitBox
In reply to this post by GitBox
asfgit closed pull request #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue

GitBox
In reply to this post by GitBox
QiangCai commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604764895
 
 
   @ajantha-bhat it also impact carbondata table.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services