GitHub user BJangir opened a pull request:
https://github.com/apache/carbondata/pull/2658 [Carbondata 2885]Broadcast Issue and Small file distribution Issue Issue :- 1. In External Table Carbon Relation sizeInByte is wrong (always 0) because of this Join Queries are identified for broadcast even Table actual size is > 10MB( default broadcast).This is making fail some of the join table ( table which should select sortmergeJoin but because of wrong calculation it gone for broadcast join) 2. if Merge_small_file task distribution is enabled ,Join queries are failed (TPCH). carbon opens many carbon files but it not getting closed. Root Cause :- 1. Current relation size calculation is based on tablestatus file but since External Table does not have tablestatus file so always zero was returned. 2. if Merge_small_file task distribution is enabled carbon opens many carbon files but it not getting closed. Solution :- 1. if Table is External Table then calculate size from TablePath . 2. close the carbon files for scan is finished. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done Manually testing in 3 node cluster - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/BJangir/incubator-carbondata CARBONDATA-2885 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2658.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2658 ---- commit 69fe7241e0cef5d7b9a6ac9e87018b3d44dd60a0 Author: BJangir <babulaljangir111@...> Date: 2018-08-24T09:17:49Z [CARBONDATA-2885] Broadcast Issue and Small file distribution Issue ---- --- |
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2658#discussion_r212576753 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala --- @@ -191,6 +191,14 @@ case class CarbonRelation( } } } + else if (carbonTable.isExternalTable) { --- End diff -- add check in above code for normal table, no need to check for tablestatus file as extrenal table tablestatus will not be present --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2658 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6383/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2658 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6385/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2658 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6761/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2658 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8038/ --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/2658 LGTM --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |