Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2734: [CARBONDATA-2946] Bloom filter backward compa...

Classic

List

38 messages Options

Options

12

[GitHub] carbondata pull request #2734: [CARBONDATA-2946] Bloom filter backward compa...

GitHub user dhatchayani opened a pull request:

https://github.com/apache/carbondata/pull/2734

[CARBONDATA-2946] Bloom filter backward compatibility with adaptive encoding and Refactor

**(1)** Refactored the already existing code in ColumnPageWrapper to fill the data to vector
**(2)** **Problem:**
Bloom filter writing is like measure column. So backward compatibility is not ensured. If we are writing it as like measure column, filter queries will fail.

**Solution:**
Write the bloom also same as that of no dictionary column. Take care of null bitsets(i.e null values) as null would have been written as 0 as the page is written like that of measure.

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhatchayani/carbondata CARBONDATA-2946

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2734.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2734

----
commit 984eef1879d92514000a8f5e05bf82e935403066
Author: dhatchayani <dhatcha.official@...>
Date: 2018-09-19T12:37:11Z

[CARBONDATA-2946] Bloom filter backward compatibility with adaptive encoding and Refactor

----

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2734

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/355/

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2734

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8602/

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2734

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/532/

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

In reply to this post by qiuchenjian-2

Github user dhatchayani commented on the issue:

https://github.com/apache/carbondata/pull/2734

@kevinjmh please review this PR

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

In reply to this post by qiuchenjian-2

Github user dhatchayani commented on the issue:

https://github.com/apache/carbondata/pull/2734

@xuchuanyin Please review this PR

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

In reply to this post by qiuchenjian-2

Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2734

Test case which found this bug pass. Please refer last PR #2654 in description for tracking

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2734

I think it's better to keep the value in bloomfilter index the same as that after carbondata filed converter, so that we do not handle it specially for building index as well as for querying which may waste the time.

The main problem maybe that we need to handle compatibility for old index file before 1.5.0.

To be compatible with the previous version, we need to indicate the version in some way.

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2734

@dhatchayani can you check PR #2751 , that's a prototype of my idea to handle it.

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

In reply to this post by qiuchenjian-2

Github user dhatchayani commented on the issue:

https://github.com/apache/carbondata/pull/2734

@xuchuanyin , ok I ll check PR#2751

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2734

@xuchuanyin I feel we don't need versioning here. We will use the same byte conversion as old version so that there would not be any backward compatibility.
@dhatchayani Please remove the null handling here as it would be same as measure now. Let it be same as old code of null measure handling.

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2734

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/431/

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2734

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/610/

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2734

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8680/

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2734

@ravipesala I don't agree with this. I think we should index the real value that stored in carbon file. (Early @QiangCai agreed with this point in another problem that similar to this.)
Otherwise, the code may be difficult to maintain.

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Unify conversion while writing to ...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2734

Hi, I've tested this PR in local machine and it works fine.

Steps used to verify this:
```
1. Use CarbonData 1.4.1-RC2 jar and start spark & JDBCServer & beeline

2. CREATE TABLE
create table test_adpt_int (id int, name string, age int) stored by 'carbondata' TBLPROPERTIES('sort_columns'='id');

3. CREATE DATAMAP
create datamap dm_id on table test_adpt_int using 'bloomfilter' DMPROPERTIES('index_columns'='id');

4. LOAD
insert into table test_adpt_int values (1, 'name1', 10),(3, 'name3', 30),(5, 'name5', 50),(7, 'name7', 70),(9, 'name9', 90),(10, 'name10', 100);

5. QUERY
select * from test_adpt_int where id = 6;
select * from test_adpt_int where id = 5;

6. Use master code and apply current PR to generate jar and restart spark & JDBCServer & beeline

7. QUERY should work fine
select * from test_adpt_int where id = 6;
select * from test_adpt_int where id = 5;

8. LOAD again
insert into table test_adpt_int values (1, 'name1', 10),(3, 'name3', 30),(5, 'name5', 50),(7, 'name7', 70),(9, 'name9', 90),(10, 'name10', 100);

9. QUERY again should work fine
select * from test_adpt_int where id = 6;
select * from test_adpt_int where id = 5;
```
besides, the bloom index folder looks like below:

![image](https://user-images.githubusercontent.com/10445758/45991871-53ab0600-c0b9-11e8-8320-38337b6eb23f.png)
The segment generated in 1.4.1 do not have version info file while the segment generated in 1.5.0 has the version info.
*Note:* in 1.5.0, we introduce the 'mergeShard' to merge the bloom index file.

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Unify conversion while writing to ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2734

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/462/

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Unify conversion while writing to ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2734

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8712/

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Unify conversion while writing to ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2734

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/642/

---

[GitHub] carbondata issue #2734: [CARBONDATA-2946] Unify conversion while writing to ...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2734

retest this please

---

12