[GitHub] carbondata pull request #2949: [WIP] support parallel block pruning for non-...

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2949: [WIP] support parallel block pruning for non-...

qiuchenjian-2
GitHub user ajantha-bhat opened a pull request:

    https://github.com/apache/carbondata/pull/2949

    [WIP] support parallel block pruning for non-default datamaps

    [WIP] support parallel block pruning for non-default datamaps
   
    This PR dependent on #2936
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajantha-bhat/carbondata working_backup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2949
   
----
commit 6237d69fcc0ddc1a08c74579762b721108a251fe
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-11-20T16:45:06Z

    parllelize block pruning

commit e8e912daf3ada357352e006ec9ce435d4c4b1625
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-11-22T11:01:53Z

    reveiw comment fix

commit d0bf82f276618f6fa09cbce65f714394b5fa4e0c
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-11-23T13:22:07Z

    support parallel pruning for non-default datamaps

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1526/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1527/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9785/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1737/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1530/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9788/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1740/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1536/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1747/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9795/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2949: [WIP] support parallel block pruning for non-...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r236571984
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
    @@ -70,4 +70,6 @@ void init(DataMapModel dataMapModel)
        */
       void finish();
     
    +  // can return , number of records information that are stored in datamap.
    --- End diff --
   
    "can return"?
    What does this mean?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2949: [WIP] support parallel block pruning for non-...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r236746764
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
    @@ -70,4 +70,6 @@ void init(DataMapModel dataMapModel)
        */
       void finish();
     
    +  // can return , number of records information that are stored in datamap.
    --- End diff --
   
    ok, changed to just "returns"


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1560/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9818/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1771/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r236907320
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -205,26 +195,53 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
           final FilterResolverIntf filterExp, final List<PartitionSpec> partitions,
           List<ExtendedBlocklet> blocklets, final Map<Segment, List<DataMap>> dataMaps,
           int totalFiles) {
    +    /*
    +     *********************************************************************************
    +     * Below is the example of how this part of code works.
    +     * consider a scenario of having 5 segments, 10 datamaps in each segment,
    --- End diff --
   
    Also what does the 'record' mean below?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r236907065
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -205,26 +195,53 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
           final FilterResolverIntf filterExp, final List<PartitionSpec> partitions,
           List<ExtendedBlocklet> blocklets, final Map<Segment, List<DataMap>> dataMaps,
           int totalFiles) {
    +    /*
    +     *********************************************************************************
    +     * Below is the example of how this part of code works.
    +     * consider a scenario of having 5 segments, 10 datamaps in each segment,
    --- End diff --
   
    What do you mean by saying '10 datamaps in each segment'?
    Do you mean '10 index files or merged index files or blocklet or something else?'


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r240900313
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -205,26 +195,53 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
           final FilterResolverIntf filterExp, final List<PartitionSpec> partitions,
           List<ExtendedBlocklet> blocklets, final Map<Segment, List<DataMap>> dataMaps,
           int totalFiles) {
    +    /*
    +     *********************************************************************************
    +     * Below is the example of how this part of code works.
    +     * consider a scenario of having 5 segments, 10 datamaps in each segment,
    --- End diff --
   
    BlockDatamap and blockletDatamap can store multiple files information. Each file is one row in that datamap. But non-default datamaps are not like that, so default datamaps distribution in multithread happens based on number of entries in datamaps, for non-default datamps distribution is based on number of datamaps (one datamap is considered as one record for non-default datamaps)
   
    ALso 10 datamap in a segment means, one merge index file has info of 10 index files


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r241279625
 
    --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java ---
    @@ -436,4 +436,9 @@ public String toString() {
       public void finish() {
     
       }
    +
    +  @Override public int getNumberOfEntries() {
    --- End diff --
   
    Move this method to available abstract class .


---
12