QiangCai opened a new pull request #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681 ### Why is this PR needed? Spark ReusedExchange rule can't recognition the same Exchange plan on carbon table. So the query on the carbon table doesn't reuse Exchange, it leads to bad performance. For Example: ``` create table t1(c1 int, c2 string) using carbondata explain select c2, sum(c1) from t1 group by c2 union all select c2, sum(c1) from t1 group by c2 ``` physical plan as following: ``` Union :- *(2) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))]) : +- Exchange hashpartitioning(c2#37, 200) : +- *(1) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))]) : +- *(1) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct<c1:int,c2:string> +- *(4) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))]) +- Exchange hashpartitioning(c2#37, 200) +- *(3) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))]) +- *(3) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct<c1:int,c2:string> ``` after change, physical plan as following: ``` Union :- *(2) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))]) : +- Exchange hashpartitioning(c2#37, 200) : +- *(1) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))]) : +- *(1) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct<c1:int,c2:string> +- *(4) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))]) +- ReusedExchange [c2#37, sum#54L], Exchange hashpartitioning(c2#37, 200) ``` ### What changes were proposed in this PR? change CarbonFileIndex class to case class. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
CarbonDataQA1 commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604413810 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2566/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604422173 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/858/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604483281 LGTM good finding. This is applicable only for fileFormat ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
asfgit closed pull request #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
QiangCai commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604764895 @ajantha-bhat it also impact carbondata table. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |