GitHub user akashrn5 opened a pull request:
https://github.com/apache/carbondata/pull/2662 [CARBONDATA-2889]Add decoder based fallback mechanism in local dictionary to reduce memory footprint Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata fallback Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2662.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2662 ---- commit 8729c673ed06ee16ed256270e31c495bd1568bfd Author: akashrn5 <akashnilugal@...> Date: 2018-08-20T04:59:26Z Add decoder based fallback mechanism in local dictionary to reduce memory footprint ---- --- |
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2662 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6420/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2662 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8083/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2662 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/23/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2662#discussion_r213160857 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/EncodedBlocklet.java --- @@ -87,19 +91,24 @@ private void addPageMetadata(EncodedTablePage encodedTablePage) { * @param encodedTablePage * encoded table page */ - private void addEncodedMeasurePage(EncodedTablePage encodedTablePage) { + private void addEncodedMeasurePage(EncodedTablePage encodedTablePage, + Map<String, LocalDictionaryGenerator> localDictionaryGeneratorMap) { // for first page create new list if (null == encodedMeasureColumnPages) { encodedMeasureColumnPages = new ArrayList<>(); // adding measure pages for (int i = 0; i < encodedTablePage.getNumMeasures(); i++) { - BlockletEncodedColumnPage blockletEncodedColumnPage = new BlockletEncodedColumnPage(null); - blockletEncodedColumnPage.addEncodedColumnColumnPage(encodedTablePage.getMeasure(i)); + BlockletEncodedColumnPage blockletEncodedColumnPage = new BlockletEncodedColumnPage(null, + Boolean.parseBoolean(CarbonProperties.getInstance() --- End diff -- What if the configuration is changed during data loading? So that each column page will have different configuration, will this be OK? --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2662#discussion_r213161218 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/FallbackDecoderBasedColumnPageEncoder.java --- @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.datastore.page; + +import java.util.concurrent.Callable; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.TableSpec; +import org.apache.carbondata.core.datastore.compression.CompressorFactory; +import org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage; +import org.apache.carbondata.core.keygenerator.KeyGenerator; +import org.apache.carbondata.core.keygenerator.factory.KeyGeneratorFactory; +import org.apache.carbondata.core.localdictionary.generator.LocalDictionaryGenerator; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.util.CarbonUtil; + +public class FallbackDecoderBasedColumnPageEncoder implements Callable<FallbackEncodedColumnPage> { + /** + * actual local dictionary generated column page + */ + private EncodedColumnPage encodedColumnPage; + + /** + * actual index in the page + * this is required as in a blocklet few pages will be local dictionary + * encoded and few pages will be plain text encoding + * in this case local dictionary encoded page + */ + private int pageIndex; + + private LocalDictionaryGenerator localDictionaryGenerator; + + public FallbackDecoderBasedColumnPageEncoder(EncodedColumnPage encodedColumnPage, int pageIndex, + LocalDictionaryGenerator localDictionaryGenerator) { + this.encodedColumnPage = encodedColumnPage; + this.pageIndex = pageIndex; + this.localDictionaryGenerator = localDictionaryGenerator; + } + + @Override public FallbackEncodedColumnPage call() throws Exception { + + // uncompress the encoded column page + byte[] bytes = CompressorFactory.getInstance().getCompressor() --- End diff -- emm, PR #2628 changed this. We should get the compressor from input configuration or from the metadata. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2662 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6464/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2662 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/81/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2662 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6465/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2662 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8152/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2662 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8169/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2662 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/98/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2662 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6480/ --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2662#discussion_r213922450 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/EncodedBlocklet.java --- @@ -87,19 +91,24 @@ private void addPageMetadata(EncodedTablePage encodedTablePage) { * @param encodedTablePage * encoded table page */ - private void addEncodedMeasurePage(EncodedTablePage encodedTablePage) { + private void addEncodedMeasurePage(EncodedTablePage encodedTablePage, + Map<String, LocalDictionaryGenerator> localDictionaryGeneratorMap) { // for first page create new list if (null == encodedMeasureColumnPages) { encodedMeasureColumnPages = new ArrayList<>(); // adding measure pages for (int i = 0; i < encodedTablePage.getNumMeasures(); i++) { - BlockletEncodedColumnPage blockletEncodedColumnPage = new BlockletEncodedColumnPage(null); - blockletEncodedColumnPage.addEncodedColumnColumnPage(encodedTablePage.getMeasure(i)); + BlockletEncodedColumnPage blockletEncodedColumnPage = new BlockletEncodedColumnPage(null, + Boolean.parseBoolean(CarbonProperties.getInstance() --- End diff -- i will take the performnce report with this change, then we can decide --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2662 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8170/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2662 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/99/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/2662 Please fill PR description and do you have memory consumption comparison for this PR? @akashrn5 --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on the issue:
https://github.com/apache/carbondata/pull/2662 @jackylk i will test and publish the comparision here, i am still testing this PR --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2662 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8187/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2662 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/116/ --- |
Free forum by Nabble | Edit this page |