GitHub user xuchuanyin opened a pull request:
https://github.com/apache/carbondata/pull/2778 [CARBONDATA-2980][BloomDataMap] Fix bug in clearing bloomindex cache when recreating table and datamap clear bloomindex cache when we drop bloomfilter datamap, otherwise after dropping and recreating the table and datamap, query will give wrong result due to the stale cache. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0928_bug_drop_bloom_cache Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2778.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2778 ---- commit 9d7ae49394659abb1377b5524f8fe98e16529281 Author: xuchuanyin <xuchuanyin@...> Date: 2018-09-27T16:39:00Z clear bloomindex cache when dropping datamap clear bloomindex cache when we drop bloomfilter datamap, otherwise recreating the datamap and query will give wrong result due to the stale cache. ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2778 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/614/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2778 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8875/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2778 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/806/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2778 @xuchuanyin ...I think this PR changes are not correct...you are calling `lruCache.clear()` which will clear the complete LRU cache map....that means all the entries for all the tables will be cleared because LRUCache instance is only 1 per JVM.....I believe through this PR you are only trying to clear the stale entry for table being recreated You can check the flow of drop table to understand how the dataMap cache gets cleared on dropping the table and try to incorporate Bloom dataMap cache clearing changes as per that. You can use the existing API `void invalidate(K key)` and try not to introduce a new API `void invalidateAll()` --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2778 @manishgupta88 Oh, if it is so, then I think the interfaces of cache are confusing. Currently we use only one cache instance for all purpose like mainDataMap, bloomDataMap, dictionary, but we didn't provide a way to separate them well. Here my initial intention is to drop all the cache for one datamap. If I call `invalidate(K key)`, I need to provide all the keys, which is inconvenient. --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2778 @xuchuanyin .....`CarbonLRUCache` instance is one per JVM but cache implementation is different for different purpose like we have separate cache provider implementation for blockDataMap and dictionary. You can check the `CacheProvider` class to get some more details on it. Cache is a standard interface which has very generic methods. Because we are storing all the entries in LRU cache to control the memory through LRU eviction all the keys are being stored in LRU cache map. BloomDataMap should also be aware about its keys and then use the invalidate API to clean all the required keys. Check the `clear` method implementation of BlockletDataMpaFactory --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2778 @manishgupta88 yeah, I go through the code again and finally find the root cause: the key used to load cache and the key used to clear cache is different due to the difference in path separator between linux and windows. Please check the modification again. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2778#discussion_r221417633 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java --- @@ -235,13 +235,13 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, } else if (carbonFile.getName().equals(BloomIndexFileStore.MERGE_INPROGRESS_FILE)) { mergeShardInprogress = true; } else if (carbonFile.isDirectory()) { - shardPaths.add(carbonFile.getAbsolutePath()); + shardPaths.add(FileFactory.getPath(carbonFile.getAbsolutePath()).toString()); --- End diff -- FYIï¼In windows env, the `carbonFile.getAbsolutePath` returns a path with path separator '\' while when we loading the cache, the path separator is '/'. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2778 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/649/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2778 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/844/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2778 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8912/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2778 LGTM --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |