Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Rever...

Classic

List

63 messages Options

Options

1234

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Rever...

GitHub user xuchuanyin opened a pull request:

https://github.com/apache/carbondata/pull/2565

[HotFix][CARBONDATA-2788][BloomDataMap] Revert optimization for blockletId in rebuilding datamap

We found querying huge data with rebuilding bloom datamap will give
incorrect result. The root cause is that the blockletId in
ResultCollector is wrong. (This was introduced in PR2539)
We will revert the previous modification for this. Now it is checked and
works fine.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuchuanyin/carbondata 0726_revert_rebuild_rdd_blockletno

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2565.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2565

----
commit 8889078ea9d1328366dc27d633b3f5ebf1906322
Author: xuchuanyin <xuchuanyin@...>
Date: 2018-07-26T15:22:58Z

Revert optimize blockletId in rebuilding datamap

We found querying huge data with rebuilding bloom datamap will give
incorrect result. The root cause is that the blockletId in
ResultCollector is wrong. (This was introduced in PR2539)
We will revert the previous modification for this. Now it is checked and
works fine.

----

---

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Revert optim...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2565

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7538/

---

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Revert optim...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2565

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6292/

---

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r205655699

--- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java ---
@@ -103,7 +106,19 @@ public void init(DataMapModel dataMapModel) throws IOException {
/**
* init field converters for index columns
*/
- public void initIndexColumnConverters(CarbonTable carbonTable, List<CarbonColumn> indexedColumn) {
+ public void initIndexColumnConverters(CarbonTable carbonTable, String dataMapName,
+ List<CarbonColumn> indexedColumn) {
+ String cacheLevel = MapUtils.getString(
+ carbonTable.getTableInfo().getFactTable().getTableProperties(),
+ CarbonCommonConstants.CACHE_LEVEL, CarbonCommonConstants.CACHE_LEVEL_DEFAULT_VALUE);
+ this.isBlockletCacheLevel = cacheLevel.equalsIgnoreCase("blocklet");
+ if (!this.isBlockletCacheLevel) {
+ LOGGER.warn(
+ String.format("BloomFilter datamap %s runs with cache_level=block for table %s.%s,"
+ + " which may decrease its pruning performance",
--- End diff --

change to `which may decrease its pruning benefit, which lead to read more data`

---

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r205656230

--- Diff: integration/spark2/src/main/scala/org/apache/carbondata/datamap/IndexDataMapRebuildRDD.scala ---
@@ -357,13 +357,20 @@ class IndexDataMapRebuildRDD[K, V](
// skip clear datamap and we will do this adter rebuild
reader.setSkipClearDataMapAtClose(true)

+ // currently blockletId in rowWithPosition is wrong, we cannot use it
--- End diff --

This is a bit confusing, can you rephrase it

---

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r205657316

--- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java ---
@@ -103,7 +106,19 @@ public void init(DataMapModel dataMapModel) throws IOException {
/**
* init field converters for index columns
*/
- public void initIndexColumnConverters(CarbonTable carbonTable, List<CarbonColumn> indexedColumn) {
+ public void initIndexColumnConverters(CarbonTable carbonTable, String dataMapName,
+ List<CarbonColumn> indexedColumn) {
+ String cacheLevel = MapUtils.getString(
+ carbonTable.getTableInfo().getFactTable().getTableProperties(),
+ CarbonCommonConstants.CACHE_LEVEL, CarbonCommonConstants.CACHE_LEVEL_DEFAULT_VALUE);
+ this.isBlockletCacheLevel = cacheLevel.equalsIgnoreCase("blocklet");
+ if (!this.isBlockletCacheLevel) {
+ LOGGER.warn(
+ String.format("BloomFilter datamap %s runs with cache_level=block for table %s.%s,"
+ + " which may decrease its pruning performance",
--- End diff --

OK

---

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r205657632

--- Diff: integration/spark2/src/main/scala/org/apache/carbondata/datamap/IndexDataMapRebuildRDD.scala ---
@@ -357,13 +357,20 @@ class IndexDataMapRebuildRDD[K, V](
// skip clear datamap and we will do this adter rebuild
reader.setSkipClearDataMapAtClose(true)

+ // currently blockletId in rowWithPosition is wrong, we cannot use it
--- End diff --

OK

---

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2565

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7543/

---

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2565

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6297/

---

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2565

retest this please

---

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2565

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7545/

---

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2565

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6299/

---

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r205676555

--- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java ---
@@ -165,9 +180,11 @@ public void initIndexColumnConverters(CarbonTable carbonTable, List<CarbonColumn
for (CarbonBloomFilter bloomFilter : bloomIndexList) {
boolean scanRequired = bloomFilter.membershipTest(new Key(bloomQueryModel.filterValue));
if (scanRequired) {
+ String blockletNo =
+ isBlockletCacheLevel ? String.valueOf(bloomFilter.getBlockletNo()) : "-1";
--- End diff --

I think Bloom dataMap should return the actual blocklet Id. The old behavior should not be modified based on cache level

---

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2565

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6020/

---

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2565

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6021/

---

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2565

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7567/

---

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2565

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6322/

---

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2565

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7573/

---

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2565

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7575/

---

[GitHub] carbondata issue #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix bugs in ...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2565

Till now, there is still a problem: for the test case added in Bloom*FunctionSuite, the explain query output will give negative pruned result like:
```
|== CarbonData Profiler ==
Table Scan on test_rcd
- total blocklets: 1
- filter: (city <> null and city = city40)
- pruned by Main DataMap
- skipped blocklets: 0
- pruned by CG DataMap
- name: dm_rcd
- provider: bloomfilter
- skipped blocklets: -1
|
|== Physical Plan ==
*FileScan carbondata default.test_rcd[id#172,country#173,city#174,population#175,random1#176,random2#177,random3#178,random4#179,random5#180,random6#181,random7#182,random8#183,random9#184,random10#185,random11#186,random12#187] PushedFilters: [IsNotNull(city), EqualTo(city,city40)]|
```

---

1234