GitHub user ajantha-bhat opened a pull request:
https://github.com/apache/carbondata/pull/2895 [HOTFIX] Fix NPE in spark, when same vector reads files with local dictionary and without local dictionary problem: NPE in spark, when same vector reads files with local dictionary and without local dictionary cause: when two carbondata files are present, one with local dictionary and one without local dictionary. If same vector is used to read this files [can happen if task is launched to group of files]. If local dictionary files are found first, dictionary is set for that vector. But it was never reset for another file reading. solution: reset dictionary once batch is processed,set only for local dictionary batch processing. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done yes, cluster testing done. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata master_new Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2895.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2895 ---- commit 99c7621336e3cf180bfa0c3a326a2f1fafe51631 Author: ajantha-bhat <ajanthabhat@...> Date: 2018-11-05T10:00:27Z Fix vectcor reading with local dictionary and without local dictionary ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2895 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1270/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2895 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9534/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2895 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1485/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2895#discussion_r230814030 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java --- @@ -61,10 +61,7 @@ public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] d int columnValueSize = dimensionDataChunkStore.getColumnValueSize(); int rowsNum = data.length / columnValueSize; CarbonColumnVector vector = vectorInfo.vector; - if (!dictionary.isDictionaryUsed()) { - vector.setDictionary(dictionary); - dictionary.setDictionaryUsed(); - } + vector.setDictionary(dictionary); --- End diff -- Is the same handling required in `fillRow` method also in the same class?..If required then `isDictionaryUsed` and `setDictionaryUsed` API's will not be required and those can also be removed from the interface --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2895 LGTM apart from a minor comment --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2895 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1503/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2895 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1291/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2895 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9552/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2895#discussion_r230997947 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java --- @@ -61,10 +61,7 @@ public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] d int columnValueSize = dimensionDataChunkStore.getColumnValueSize(); int rowsNum = data.length / columnValueSize; CarbonColumnVector vector = vectorInfo.vector; - if (!dictionary.isDictionaryUsed()) { - vector.setDictionary(dictionary); - dictionary.setDictionaryUsed(); - } + vector.setDictionary(dictionary); --- End diff -- I have checked this while coding, fill row is our method. So, no issues. Only this vector is spark vector and dictionary needs to clear for it. --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2895#discussion_r230999455 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java --- @@ -61,10 +61,7 @@ public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] d int columnValueSize = dimensionDataChunkStore.getColumnValueSize(); int rowsNum = data.length / columnValueSize; CarbonColumnVector vector = vectorInfo.vector; - if (!dictionary.isDictionaryUsed()) { - vector.setDictionary(dictionary); - dictionary.setDictionaryUsed(); - } + vector.setDictionary(dictionary); --- End diff -- Both the method are called in carbon flow for vector filling. One is direct fill case and the other one is old vector fill flow. Please cross check once --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2895#discussion_r231001948 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java --- @@ -61,10 +61,7 @@ public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] d int columnValueSize = dimensionDataChunkStore.getColumnValueSize(); int rowsNum = data.length / columnValueSize; CarbonColumnVector vector = vectorInfo.vector; - if (!dictionary.isDictionaryUsed()) { - vector.setDictionary(dictionary); - dictionary.setDictionaryUsed(); - } + vector.setDictionary(dictionary); --- End diff -- done --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2895 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1514/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2895 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1301/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2895 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9562/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2895 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1302/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2895 LGTM..can be merged once build is passed --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2895 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1515/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2895 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9563/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2895 retest this please --- |
Free forum by Nabble | Edit this page |