[GitHub] carbondata pull request #2751: WIP:[CARBONDATA-2946] Add bloomindex version ...

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2751: WIP:[CARBONDATA-2946] Add bloomindex version ...

qiuchenjian-2
GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/2751

    WIP:[CARBONDATA-2946] Add bloomindex version info file for compatibility

    we add an empty version file to indicate the version of the bloomindex.
    The original reason is that in 1.5.0, for non-dictionary primitive fields,
    carbondata changes the encoding for them -- using primitives instead of literal bytes.
    For compatibility for the previous version of bloom index,
    we add an version info file to indicate the version of this index file.
   
    During writing the bloom index, we always write the bloom value the same as that in carbon file,
    while during querying, we will convert the filter value based on the version.
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata 0924_bloom_compatibility

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2751.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2751
   
----
commit ed8bf431873cbf41928255246fb311c435b32aee
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-09-24T03:18:36Z

    Add bloomindex version info file for compatibility
   
    we add an empty version file to indicate the version of the bloomindex.
    The original reason is that in 1.5.0, for non-dictionary primitive fields,
    carbondata changes the encoding for them -- using primitives instead of literal bytes.
    For compatibility for the previous version of bloom index,
    we add an version info file to indicate the version of this index file.
   
    During writing the bloom index, we always write the bloom value the same as that in carbon file,
    while during querying, we will convert the filter value based on the version.

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: WIP:[CARBONDATA-2946] Add bloomindex version info fi...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/422/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: WIP:[CARBONDATA-2946] Add bloomindex version info fi...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/601/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: WIP:[CARBONDATA-2946] Add bloomindex version info fi...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8671/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: WIP:[CARBONDATA-2946] Add bloomindex version info fi...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/423/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: WIP:[CARBONDATA-2946] Add bloomindex version info fi...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8672/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: WIP:[CARBONDATA-2946] Add bloomindex version info fi...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/602/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: WIP:[CARBONDATA-2946] Add bloomindex version info fi...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: WIP:[CARBONDATA-2946] Add bloomindex version info fi...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/425/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: WIP:[CARBONDATA-2946] Add bloomindex version info fi...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8674/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: WIP:[CARBONDATA-2946] Add bloomindex version info fi...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/604/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: [CARBONDATA-2946] Add bloomindex version info file f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: [CARBONDATA-2946] Add bloomindex version info file f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/456/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: [CARBONDATA-2946] Add bloomindex version info file f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/636/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: [CARBONDATA-2946] Add bloomindex version info file f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8706/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: [CARBONDATA-2946] Add bloomindex version info file f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/460/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: [CARBONDATA-2946] Add bloomindex version info file f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8710/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: [CARBONDATA-2946] Add bloomindex version info file f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/640/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: [CARBONDATA-2946] Add bloomindex version info file f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2751: [CARBONDATA-2946] Add bloomindex version info file f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2751
 
    Hi, I've tested this PR in local machine and it works fine.
   
    Steps used to verify this:
    ```
    1. Use CarbonData 1.4.1-RC2 jar and start spark & JDBCServer & beeline
   
    2. CREATE TABLE
     create table test_adpt_int (id int, name string, age int) stored by 'carbondata' TBLPROPERTIES('sort_columns'='id');
   
    3. CREATE DATAMAP
     create datamap dm_id on table test_adpt_int using 'bloomfilter' DMPROPERTIES('index_columns'='id');
   
    4. LOAD
     insert into table test_adpt_int values (1, 'name1', 10),(3, 'name3', 30),(5, 'name5', 50),(7, 'name7', 70),(9, 'name9', 90),(10, 'name10', 100);
   
    5. QUERY
    select * from test_adpt_int where id = 6;
    select * from test_adpt_int where id = 5;
   
    6. Use master code and apply current PR to generate jar and restart spark & JDBCServer & beeline
   
    7. QUERY should work fine
    select * from test_adpt_int where id = 6;
    select * from test_adpt_int where id = 5;
   
    8. LOAD again
     insert into table test_adpt_int values (1, 'name1', 10),(3, 'name3', 30),(5, 'name5', 50),(7, 'name7', 70),(9, 'name9', 90),(10, 'name10', 100);
   
    9. QUERY again should work fine
    select * from test_adpt_int where id = 6;
    select * from test_adpt_int where id = 5;
    ```
    besides, the bloom index folder looks like below:
   
    ![image](https://user-images.githubusercontent.com/10445758/45991871-53ab0600-c0b9-11e8-8320-38337b6eb23f.png)
    The segment generated in 1.4.1 do not have version info file while the segment generated in 1.5.0 has the version info.
    *Note:* in 1.5.0, we introduce the 'mergeShard' to merge the bloom index file.


---
12