GitHub user dhatchayani opened a pull request:
https://github.com/apache/carbondata/pull/2734 [CARBONDATA-2946] Bloom filter backward compatibility with adaptive encoding and Refactor **(1)** Refactored the already existing code in ColumnPageWrapper to fill the data to vector **(2)** **Problem:** Bloom filter writing is like measure column. So backward compatibility is not ensured. If we are writing it as like measure column, filter queries will fail. **Solution:** Write the bloom also same as that of no dictionary column. Take care of null bitsets(i.e null values) as null would have been written as 0 as the page is written like that of measure. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dhatchayani/carbondata CARBONDATA-2946 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2734.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2734 ---- commit 984eef1879d92514000a8f5e05bf82e935403066 Author: dhatchayani <dhatcha.official@...> Date: 2018-09-19T12:37:11Z [CARBONDATA-2946] Bloom filter backward compatibility with adaptive encoding and Refactor ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2734 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/355/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2734 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8602/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2734 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/532/ --- |
In reply to this post by qiuchenjian-2
Github user dhatchayani commented on the issue:
https://github.com/apache/carbondata/pull/2734 @kevinjmh please review this PR --- |
In reply to this post by qiuchenjian-2
Github user dhatchayani commented on the issue:
https://github.com/apache/carbondata/pull/2734 @xuchuanyin Please review this PR --- |
In reply to this post by qiuchenjian-2
Github user kevinjmh commented on the issue:
https://github.com/apache/carbondata/pull/2734 Test case which found this bug pass. Please refer last PR #2654 in description for tracking --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2734 I think it's better to keep the value in bloomfilter index the same as that after carbondata filed converter, so that we do not handle it specially for building index as well as for querying which may waste the time. The main problem maybe that we need to handle compatibility for old index file before 1.5.0. To be compatible with the previous version, we need to indicate the version in some way. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2734 @dhatchayani can you check PR #2751 , that's a prototype of my idea to handle it. --- |
In reply to this post by qiuchenjian-2
Github user dhatchayani commented on the issue:
https://github.com/apache/carbondata/pull/2734 @xuchuanyin , ok I ll check PR#2751 --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2734 @xuchuanyin I feel we don't need versioning here. We will use the same byte conversion as old version so that there would not be any backward compatibility. @dhatchayani Please remove the null handling here as it would be same as measure now. Let it be same as old code of null measure handling. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2734 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/431/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2734 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/610/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2734 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8680/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2734 @ravipesala I don't agree with this. I think we should index the real value that stored in carbon file. (Early @QiangCai agreed with this point in another problem that similar to this.) Otherwise, the code may be difficult to maintain. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2734 Hi, I've tested this PR in local machine and it works fine. Steps used to verify this: ``` 1. Use CarbonData 1.4.1-RC2 jar and start spark & JDBCServer & beeline 2. CREATE TABLE create table test_adpt_int (id int, name string, age int) stored by 'carbondata' TBLPROPERTIES('sort_columns'='id'); 3. CREATE DATAMAP create datamap dm_id on table test_adpt_int using 'bloomfilter' DMPROPERTIES('index_columns'='id'); 4. LOAD insert into table test_adpt_int values (1, 'name1', 10),(3, 'name3', 30),(5, 'name5', 50),(7, 'name7', 70),(9, 'name9', 90),(10, 'name10', 100); 5. QUERY select * from test_adpt_int where id = 6; select * from test_adpt_int where id = 5; 6. Use master code and apply current PR to generate jar and restart spark & JDBCServer & beeline 7. QUERY should work fine select * from test_adpt_int where id = 6; select * from test_adpt_int where id = 5; 8. LOAD again insert into table test_adpt_int values (1, 'name1', 10),(3, 'name3', 30),(5, 'name5', 50),(7, 'name7', 70),(9, 'name9', 90),(10, 'name10', 100); 9. QUERY again should work fine select * from test_adpt_int where id = 6; select * from test_adpt_int where id = 5; ``` besides, the bloom index folder looks like below:  The segment generated in 1.4.1 do not have version info file while the segment generated in 1.5.0 has the version info. *Note:* in 1.5.0, we introduce the 'mergeShard' to merge the bloom index file. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2734 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/462/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2734 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8712/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2734 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/642/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2734 retest this please --- |
Free forum by Nabble | Edit this page |