Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in cl...

Classic

List

14 messages Options

Options

[GitHub] carbondata pull request #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in cl...

GitHub user xuchuanyin opened a pull request:

https://github.com/apache/carbondata/pull/2778

[CARBONDATA-2980][BloomDataMap] Fix bug in clearing bloomindex cache when recreating table and datamap

clear bloomindex cache when we drop bloomfilter datamap, otherwise
after dropping and recreating the table and datamap, query will give wrong result due to the stale
cache.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuchuanyin/carbondata 0928_bug_drop_bloom_cache

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2778.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2778

----
commit 9d7ae49394659abb1377b5524f8fe98e16529281
Author: xuchuanyin <xuchuanyin@...>
Date: 2018-09-27T16:39:00Z

clear bloomindex cache when dropping datamap

clear bloomindex cache when we drop bloomfilter datamap, otherwise
recreating the datamap and query will give wrong result due to the stale
cache.

----

---

[GitHub] carbondata issue #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in clearing ...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2778

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/614/

---

[GitHub] carbondata issue #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in clearing ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2778

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8875/

---

[GitHub] carbondata issue #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in clearing ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2778

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/806/

---

[GitHub] carbondata issue #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in clearing ...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2778

@xuchuanyin ...I think this PR changes are not correct...you are calling `lruCache.clear()` which will clear the complete LRU cache map....that means all the entries for all the tables will be cleared because LRUCache instance is only 1 per JVM.....I believe through this PR you are only trying to clear the stale entry for table being recreated
You can check the flow of drop table to understand how the dataMap cache gets cleared on dropping the table and try to incorporate Bloom dataMap cache clearing changes as per that. You can use the existing API `void invalidate(K key)` and try not to introduce a new API `void invalidateAll()`

---

[GitHub] carbondata issue #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in clearing ...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2778

@manishgupta88 Oh, if it is so, then I think the interfaces of cache are confusing. Currently we use only one cache instance for all purpose like mainDataMap, bloomDataMap, dictionary, but we didn't provide a way to separate them well.
Here my initial intention is to drop all the cache for one datamap. If I call `invalidate(K key)`, I need to provide all the keys, which is inconvenient.

---

[GitHub] carbondata issue #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in clearing ...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2778

@xuchuanyin .....`CarbonLRUCache` instance is one per JVM but cache implementation is different for different purpose like we have separate cache provider implementation for blockDataMap and dictionary. You can check the `CacheProvider` class to get some more details on it.
Cache is a standard interface which has very generic methods. Because we are storing all the entries in LRU cache to control the memory through LRU eviction all the keys are being stored in LRU cache map. BloomDataMap should also be aware about its keys and then use the invalidate API to clean all the required keys. Check the `clear` method implementation of BlockletDataMpaFactory

---

[GitHub] carbondata issue #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in clearing ...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2778

@manishgupta88 yeah, I go through the code again and finally find the root cause: the key used to load cache and the key used to clear cache is different due to the difference in path separator between linux and windows.

Please check the modification again.

---

[GitHub] carbondata pull request #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in cl...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2778#discussion_r221417633

--- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java ---
@@ -235,13 +235,13 @@ public DataMapBuilder createBuilder(Segment segment, String shardName,
} else if (carbonFile.getName().equals(BloomIndexFileStore.MERGE_INPROGRESS_FILE)) {
mergeShardInprogress = true;
} else if (carbonFile.isDirectory()) {
- shardPaths.add(carbonFile.getAbsolutePath());
+ shardPaths.add(FileFactory.getPath(carbonFile.getAbsolutePath()).toString());
--- End diff --

FYIï¼In windows env, the `carbonFile.getAbsolutePath` returns a path with path separator '\' while when we loading the cache, the path separator is '/'.

---

[GitHub] carbondata issue #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in clearing ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2778

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/649/

---

[GitHub] carbondata issue #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in clearing ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2778

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/844/

---

[GitHub] carbondata issue #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in clearing ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2778

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8912/

---

[GitHub] carbondata issue #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in clearing ...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2778

LGTM

---

[GitHub] carbondata pull request #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in cl...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2778

---