[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Rever...

classic Classic list List threaded Threaded
63 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Rever...

qiuchenjian-2
GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/2565

    [HotFix][CARBONDATA-2788][BloomDataMap] Revert optimization for blockletId in rebuilding datamap

    We found querying huge data with rebuilding bloom datamap will give
    incorrect result. The root cause is that the blockletId in
    ResultCollector is wrong. (This was introduced in PR2539)
    We will revert the previous modification for this. Now it is checked and
    works fine.
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata 0726_revert_rebuild_rdd_blockletno

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2565.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2565
   
----
commit 8889078ea9d1328366dc27d633b3f5ebf1906322
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-07-26T15:22:58Z

    Revert optimize blockletId in rebuilding datamap
   
    We found querying huge data with rebuilding bloom datamap will give
    incorrect result. The root cause is that the blockletId in
    ResultCollector is wrong. (This was introduced in PR2539)
    We will revert the previous modification for this. Now it is checked and
    works fine.

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Revert optim...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2565
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7538/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Revert optim...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2565
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6292/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2565#discussion_r205655699
 
    --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java ---
    @@ -103,7 +106,19 @@ public void init(DataMapModel dataMapModel) throws IOException {
       /**
        * init field converters for index columns
        */
    -  public void initIndexColumnConverters(CarbonTable carbonTable, List<CarbonColumn> indexedColumn) {
    +  public void initIndexColumnConverters(CarbonTable carbonTable, String dataMapName,
    +      List<CarbonColumn> indexedColumn) {
    +    String cacheLevel = MapUtils.getString(
    +        carbonTable.getTableInfo().getFactTable().getTableProperties(),
    +        CarbonCommonConstants.CACHE_LEVEL, CarbonCommonConstants.CACHE_LEVEL_DEFAULT_VALUE);
    +    this.isBlockletCacheLevel = cacheLevel.equalsIgnoreCase("blocklet");
    +    if (!this.isBlockletCacheLevel) {
    +      LOGGER.warn(
    +          String.format("BloomFilter datamap %s runs with cache_level=block for table %s.%s,"
    +              + " which may decrease its pruning performance",
    --- End diff --
   
    change to `which may decrease its pruning benefit, which lead to read more data`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2565#discussion_r205656230
 
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/datamap/IndexDataMapRebuildRDD.scala ---
    @@ -357,13 +357,20 @@ class IndexDataMapRebuildRDD[K, V](
             // skip clear datamap and we will do this adter rebuild
             reader.setSkipClearDataMapAtClose(true)
     
    +        // currently blockletId in rowWithPosition is wrong, we cannot use it
    --- End diff --
   
    This is a bit confusing, can you rephrase it


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2565#discussion_r205657316
 
    --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java ---
    @@ -103,7 +106,19 @@ public void init(DataMapModel dataMapModel) throws IOException {
       /**
        * init field converters for index columns
        */
    -  public void initIndexColumnConverters(CarbonTable carbonTable, List<CarbonColumn> indexedColumn) {
    +  public void initIndexColumnConverters(CarbonTable carbonTable, String dataMapName,
    +      List<CarbonColumn> indexedColumn) {
    +    String cacheLevel = MapUtils.getString(
    +        carbonTable.getTableInfo().getFactTable().getTableProperties(),
    +        CarbonCommonConstants.CACHE_LEVEL, CarbonCommonConstants.CACHE_LEVEL_DEFAULT_VALUE);
    +    this.isBlockletCacheLevel = cacheLevel.equalsIgnoreCase("blocklet");
    +    if (!this.isBlockletCacheLevel) {
    +      LOGGER.warn(
    +          String.format("BloomFilter datamap %s runs with cache_level=block for table %s.%s,"
    +              + " which may decrease its pruning performance",
    --- End diff --
   
    OK


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2565#discussion_r205657632
 
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/datamap/IndexDataMapRebuildRDD.scala ---
    @@ -357,13 +357,20 @@ class IndexDataMapRebuildRDD[K, V](
             // skip clear datamap and we will do this adter rebuild
             reader.setSkipClearDataMapAtClose(true)
     
    +        // currently blockletId in rowWithPosition is wrong, we cannot use it
    --- End diff --
   
    OK


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2565
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7543/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2565
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6297/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2565
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2565
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7545/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2565
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6299/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2565#discussion_r205676555
 
    --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java ---
    @@ -165,9 +180,11 @@ public void initIndexColumnConverters(CarbonTable carbonTable, List<CarbonColumn
           for (CarbonBloomFilter bloomFilter : bloomIndexList) {
             boolean scanRequired = bloomFilter.membershipTest(new Key(bloomQueryModel.filterValue));
             if (scanRequired) {
    +          String blockletNo =
    +              isBlockletCacheLevel ? String.valueOf(bloomFilter.getBlockletNo()) : "-1";
    --- End diff --
   
    I think Bloom dataMap should return the actual blocklet Id. The old behavior should not be modified based on cache level


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2565
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6020/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2565
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6021/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2565
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7567/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2565
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6322/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2565
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7573/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2565
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7575/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2565
 
    Till now, there is still a problem: for the test case added in Bloom*FunctionSuite, the explain query output will give negative pruned result like:
    ```
    |== CarbonData Profiler ==
    Table Scan on test_rcd
     - total blocklets: 1
     - filter: (city <> null and city = city40)
     - pruned by Main DataMap
        - skipped blocklets: 0
     - pruned by CG DataMap
        - name: dm_rcd
        - provider: bloomfilter
        - skipped blocklets: -1
                                        |
    |== Physical Plan ==
    *FileScan carbondata default.test_rcd[id#172,country#173,city#174,population#175,random1#176,random2#177,random3#178,random4#179,random5#180,random6#181,random7#182,random8#183,random9#184,random10#185,random11#186,random12#187] PushedFilters: [IsNotNull(city), EqualTo(city,city40)]|
    ```


---
1234