GitHub user kevinjmh opened a pull request:
https://github.com/apache/carbondata/pull/2851 [CARBONDATA-3040][BloomDataMap] Add checking before merging bloom index *Scene* There is a bug which causes query failure when we create two bloom datamaps on same table with data. *Analyse* Since we already have data, each create datamap will trigger rebuild datamap task and then trigger bloom index file merging. By debuging, we found the first datamap's bloom index files would be merged two times and the second time made bloom index file empty. *Solution* Send the datamap name in rebuild event for filter. And add file check when merging. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinjmh/carbondata fix_multi_bloom Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2851.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2851 ---- commit bcab5ac630e39a7dadee09d5b9157642d061b5e1 Author: Manhua <kevinjmh@...> Date: 2018-10-24T08:20:13Z only rebuild target datamap and add file check ---- --- |
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2851 please add description about why adding datamap name will solve the problem in the analyze part of your PR description --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2851 Another related question: What will happen if we create multiple datamaps concurrently for the same table? --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2851 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/992/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2851 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1205/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2851 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9258/ --- |
In reply to this post by qiuchenjian-2
Github user kevinjmh commented on the issue:
https://github.com/apache/carbondata/pull/2851 description updated --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2851#discussion_r228010573 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/events/DataMapEvents.scala --- @@ -60,7 +60,8 @@ case class BuildDataMapPreExecutionEvent(sparkSession: SparkSession, * example: bloom datamap, Lucene datamap */ case class BuildDataMapPostExecutionEvent(sparkSession: SparkSession, - identifier: AbsoluteTableIdentifier, segmentIdList: Seq[String], isFromRebuild: Boolean) + identifier: AbsoluteTableIdentifier, segmentIdList: Seq[String], + isFromRebuild: Boolean, dmName: String) --- End diff -- You can adjust the sequence of the parameters by moving `dmName` after `identifier`. This will make the method easy to understand: for some table's some datamap, for the corresponding segments doing rebuild or not --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2851#discussion_r228010750 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomIndexFileStore.java --- @@ -70,27 +73,37 @@ public boolean accept(CarbonFile file) { } }); + // check whether need to merge String mergeShardPath = dmSegmentPathString + File.separator + MERGE_BLOOM_INDEX_SHARD_NAME; String mergeInprogressFile = dmSegmentPathString + File.separator + MERGE_INPROGRESS_FILE; try { - // delete mergeShard folder if exists - if (FileFactory.isFileExist(mergeShardPath)) { - FileFactory.deleteFile(mergeShardPath, FileFactory.getFileType(mergeShardPath)); + if (shardPaths.length == 0 || FileFactory.isFileExist(mergeShardPath)) { + LOGGER.info("No shard data to merge or already merged for path " + mergeShardPath); --- End diff -- when will this line be reached? --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2851#discussion_r228009613 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeBloomIndexEventListener.scala --- @@ -48,8 +48,14 @@ class MergeBloomIndexEventListener extends OperationEventListener with Logging { _.getDataMapSchema.getProviderName.equalsIgnoreCase( DataMapClassProvider.BLOOMFILTER.getShortName)) - // for load process, filter lazy datamap - if (!datamapPostEvent.isFromRebuild) { + if (datamapPostEvent.isFromRebuild) { + if (null != datamapPostEvent.dmName) { + // for rebuild process, event will be called for each datamap --- End diff -- 'for each datamap' or 'only for specific datamap'? --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2851 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/999/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2851 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1212/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2851 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9265/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2851 LGTM --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |