[GitHub] carbondata pull request #2658: [Carbondata 2885]Broadcast Issue and Small fi...

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2658: [Carbondata 2885]Broadcast Issue and Small fi...

qiuchenjian-2
GitHub user BJangir opened a pull request:

    https://github.com/apache/carbondata/pull/2658

    [Carbondata 2885]Broadcast Issue and Small file distribution Issue

    Issue  :-
    1.  In External Table Carbon Relation sizeInByte is wrong (always 0) because of this Join Queries are identified for broadcast even Table actual size is > 10MB( default broadcast).This is making fail some of the join table ( table which should select sortmergeJoin but because of wrong calculation it gone for broadcast join)
   
    2.  if Merge_small_file task distribution is enabled  ,Join queries are failed (TPCH).
    carbon opens many carbon files but it not getting closed.
   
    Root Cause :- 1. Current relation size calculation is based on tablestatus file but since External Table does not have tablestatus file so always zero was returned.
    2. if Merge_small_file task distribution is enabled carbon opens many carbon files but it not getting closed.
    Solution :-
    1. if Table is External Table then calculate size from TablePath .
    2. close the carbon files for scan is finished.
   
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     NA
     - [ ] Any backward compatibility impacted?
     NA
     - [ ] Document update required?
    NA
     - [ ] Testing done
         Manually  testing in 3 node cluster  
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BJangir/incubator-carbondata CARBONDATA-2885

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2658.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2658
   
----
commit 69fe7241e0cef5d7b9a6ac9e87018b3d44dd60a0
Author: BJangir <babulaljangir111@...>
Date:   2018-08-24T09:17:49Z

    [CARBONDATA-2885] Broadcast Issue and Small file distribution Issue

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2658: [Carbondata 2885]Broadcast Issue and Small fi...

qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2658#discussion_r212576753
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala ---
    @@ -191,6 +191,14 @@ case class CarbonRelation(
             }
           }
         }
    +    else if (carbonTable.isExternalTable) {
    --- End diff --
   
    add check in above code for normal table, no need to check for tablestatus file as extrenal table tablestatus will not be present


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2658
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6383/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2658
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6385/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2658
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6761/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2658
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8038/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2658
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2658: [Carbondata 2885]Broadcast Issue and Small fi...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2658


---