GitHub user kumarvishal09 opened a pull request:
https://github.com/apache/carbondata/pull/2402 [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary Data Loading support Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kumarvishal09/incubator-carbondata branch_localdic Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2402.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2402 ---- commit 067527cce26e60eb2bbadf8536dd45bc90a2e680 Author: kumarvishal09 <kumarvishal1802@...> Date: 2018-06-04T10:11:50Z local dictionary code ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2402 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5314/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2402 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6483/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2402 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6484/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2402 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5315/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2402 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6485/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2402 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5316/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2402 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5405/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2402 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5406/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2402 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5407/ --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2402#discussion_r197603140 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java --- @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.datastore.blocklet; + +import java.io.IOException; +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Future; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.page.FallbackColumnPageEncoder; +import org.apache.carbondata.core.datastore.page.FallbackEncodedColumnPage; +import org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage; +import org.apache.carbondata.core.localdictionary.PageLevelDictionary; +import org.apache.carbondata.core.memory.MemoryException; +import org.apache.carbondata.format.LocalDictionaryChunk; + +/** + * Maintains the list of encoded page of a column in a blocklet + * and encoded dictionary values only if column is encoded using local + * dictionary + * Handle the fallback if all the pages in blocklet are not + * encoded with local dictionary + */ +public class BlockletEncodedColumnPage { + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(BlockletEncodedColumnPage.class.getName()); + + /** + * list of encoded page of a column in a blocklet + */ + private List<EncodedColumnPage> encodedColumnPageList; + + /** + * fallback executor service + */ + private ExecutorService fallbackExecutorService; + + /** + * to check whether pages are local dictionary encoded or not + */ + private boolean isLocalDictEncoded; + + /** + * page level dictionary only when column is encoded with local dictionary + */ + private PageLevelDictionary pageLevelDictionary; + + /** + * fallback future task queue; + */ + private ArrayDeque<Future<FallbackEncodedColumnPage>> fallbackFutureQueue; + + BlockletEncodedColumnPage(ExecutorService fallbackExecutorService, + EncodedColumnPage encodedColumnPage) { + this.encodedColumnPageList = new ArrayList<>(); + this.fallbackExecutorService = fallbackExecutorService; + this.encodedColumnPageList.add(encodedColumnPage); + // if dimension page is local dictionary enabled and encoded with local dictionary + if (encodedColumnPage.isLocalDictionaryEnabled() && encodedColumnPage + .isLocalDictGeneratedPage()) { + this.isLocalDictEncoded = true; + // get first page dictionary + this.pageLevelDictionary = encodedColumnPage.getPageDictionary(); + } + } + + /** + * Below method will be used to add column page of a column + * + * @param encodedColumnPage + * encoded column page + * @throws ExecutionException + * failure in fallback + * @throws InterruptedException + * failure during fallback + */ + void addEncodedColumnTable(EncodedColumnPage encodedColumnPage) + throws ExecutionException, InterruptedException { + // if local dictionary is false or column is encoded with local dictionary then + // add a page + if (!isLocalDictEncoded || encodedColumnPage.isLocalDictGeneratedPage()) { + this.encodedColumnPageList.add(encodedColumnPage); + // merge page level dictionary values + if (null != this.pageLevelDictionary) { + pageLevelDictionary.mergerDictionaryValues(encodedColumnPage.getPageDictionary()); + } + } else { + // if older pages where encoded with dictionary and new pages are with dictionary + isLocalDictEncoded = false; + pageLevelDictionary = null; + this.fallbackFutureQueue = new ArrayDeque<>(); + LOGGER.info( + "Local dictionary Fallback is initiated for column: " + encodedColumnPageList.get(0) + .getActualPage().getColumnSpec().getFieldName()); + // submit all the older pages encoded with dictionary for fallback + for (int pageIndex = 0; pageIndex < encodedColumnPageList.size(); pageIndex++) { + fallbackFutureQueue.add(fallbackExecutorService.submit( + new FallbackColumnPageEncoder(encodedColumnPageList.get(pageIndex), pageIndex))); + } + //add to page list + this.encodedColumnPageList.add(encodedColumnPage); + } + } + + /** + * Return the list of encoded page list for a column in a blocklet + * + * @return list of encoded page list + */ + public List<EncodedColumnPage> getEncodedColumnPageList() { + // if fall back queue is empty then for some pages fallback was initiated --- End diff -- correct the comment --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2402#discussion_r197602985 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java --- @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.datastore.blocklet; + +import java.io.IOException; +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Future; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.page.FallbackColumnPageEncoder; +import org.apache.carbondata.core.datastore.page.FallbackEncodedColumnPage; +import org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage; +import org.apache.carbondata.core.localdictionary.PageLevelDictionary; +import org.apache.carbondata.core.memory.MemoryException; +import org.apache.carbondata.format.LocalDictionaryChunk; + +/** + * Maintains the list of encoded page of a column in a blocklet + * and encoded dictionary values only if column is encoded using local + * dictionary + * Handle the fallback if all the pages in blocklet are not + * encoded with local dictionary + */ +public class BlockletEncodedColumnPage { + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(BlockletEncodedColumnPage.class.getName()); + + /** + * list of encoded page of a column in a blocklet + */ + private List<EncodedColumnPage> encodedColumnPageList; + + /** + * fallback executor service + */ + private ExecutorService fallbackExecutorService; + + /** + * to check whether pages are local dictionary encoded or not + */ + private boolean isLocalDictEncoded; + + /** + * page level dictionary only when column is encoded with local dictionary + */ + private PageLevelDictionary pageLevelDictionary; + + /** + * fallback future task queue; + */ + private ArrayDeque<Future<FallbackEncodedColumnPage>> fallbackFutureQueue; + + BlockletEncodedColumnPage(ExecutorService fallbackExecutorService, + EncodedColumnPage encodedColumnPage) { + this.encodedColumnPageList = new ArrayList<>(); + this.fallbackExecutorService = fallbackExecutorService; + this.encodedColumnPageList.add(encodedColumnPage); + // if dimension page is local dictionary enabled and encoded with local dictionary + if (encodedColumnPage.isLocalDictionaryEnabled() && encodedColumnPage + .isLocalDictGeneratedPage()) { + this.isLocalDictEncoded = true; + // get first page dictionary + this.pageLevelDictionary = encodedColumnPage.getPageDictionary(); + } + } + + /** + * Below method will be used to add column page of a column + * + * @param encodedColumnPage + * encoded column page + * @throws ExecutionException + * failure in fallback + * @throws InterruptedException + * failure during fallback + */ + void addEncodedColumnTable(EncodedColumnPage encodedColumnPage) + throws ExecutionException, InterruptedException { + // if local dictionary is false or column is encoded with local dictionary then + // add a page + if (!isLocalDictEncoded || encodedColumnPage.isLocalDictGeneratedPage()) { + this.encodedColumnPageList.add(encodedColumnPage); + // merge page level dictionary values + if (null != this.pageLevelDictionary) { + pageLevelDictionary.mergerDictionaryValues(encodedColumnPage.getPageDictionary()); + } + } else { + // if older pages where encoded with dictionary and new pages are with dictionary --- End diff -- change the comment, new pages are without dictionary --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2402#discussion_r197603924 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/LocalDictColumnPage.java --- @@ -0,0 +1,319 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.page; + +import java.io.IOException; +import java.math.BigDecimal; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.localdictionary.PageLevelDictionary; +import org.apache.carbondata.core.localdictionary.exception.DictionaryThresholdReachedException; +import org.apache.carbondata.core.localdictionary.generator.LocalDictionaryGenerator; +import org.apache.carbondata.core.util.ByteUtil; + +/** + * Column page implementation for Local dictionary generated columns + * Its a decorator over two column page + * 1. Which will hold the actual data + * 2. Which will hold the dictionary encoded data + */ +public class LocalDictColumnPage extends ColumnPage { + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(LocalDictColumnPage.class.getName()); + + /** + * to maintain page level dictionary for column page + */ + private PageLevelDictionary pageLevelDictionary; + + /** + * to hold the actual data of the column + */ + private ColumnPage actualDataColumnPage; + + /** + * to hold the dictionary encoded column page + */ + private ColumnPage encodedDataColumnPage; + + /** + * to check if actual column page memory is already clear + */ + private boolean isActualPageMemoryFreed; + + /** + * Create a new column page with input data type and page size. + */ + protected LocalDictColumnPage(ColumnPage actualDataColumnPage, ColumnPage encodedColumnpage, + LocalDictionaryGenerator localDictionaryGenerator) { + super(actualDataColumnPage.getColumnSpec(), actualDataColumnPage.getDataType(), + actualDataColumnPage.getPageSize()); + if (!localDictionaryGenerator.isThresholdReached()) { + pageLevelDictionary = new PageLevelDictionary(localDictionaryGenerator, + actualDataColumnPage.getColumnSpec().getFieldName()); + this.encodedDataColumnPage = encodedColumnpage; + } + this.actualDataColumnPage = actualDataColumnPage; + } + + @Override public byte[][] getByteArrayPage() { + if (null != pageLevelDictionary) { + return encodedDataColumnPage.getByteArrayPage(); + } else { + return actualDataColumnPage.getByteArrayPage(); + } + } + + /** + * Below method will be used to check whether page is local dictionary + * generated or not. This will be used for while enoding the the page + * + * @return + */ + public boolean isLocalDictGeneratedPage() { + return null != pageLevelDictionary; + } + + /** + * Below method will be used to add column data to page + * + * @param rowId row number + * @param bytes actual data + */ + @Override public void putBytes(int rowId, byte[] bytes) { + if (null != pageLevelDictionary) { + try { + actualDataColumnPage.putBytes(rowId, bytes); + int dictionaryValue = pageLevelDictionary.getDictionaryValue(bytes); + encodedDataColumnPage.putBytes(rowId, ByteUtil.toBytes(dictionaryValue)); + } catch (DictionaryThresholdReachedException e) { + LOGGER.error(e, "Local Dictionary threshold reached for the column: " + actualDataColumnPage + .getColumnSpec().getFieldName()); + pageLevelDictionary = null; + encodedDataColumnPage.freeMemory(); + encodedDataColumnPage = null; + } + } else { + actualDataColumnPage.putBytes(rowId, bytes); + } + } + + public PageLevelDictionary getPageDictionary() { + return pageLevelDictionary; + } + + @Override public void disableLocalDictEncoding() { + pageLevelDictionary = null; + freeEncodedColumnPage(); + } + + @Override public PageLevelDictionary getColumnPageDictionary() { --- End diff -- duplicate method are present, remove one `getColumnPageDictionary` and `getPageDictionary` --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2402#discussion_r197602851 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java --- @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.datastore.blocklet; + +import java.io.IOException; +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Future; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.page.FallbackColumnPageEncoder; +import org.apache.carbondata.core.datastore.page.FallbackEncodedColumnPage; +import org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage; +import org.apache.carbondata.core.localdictionary.PageLevelDictionary; +import org.apache.carbondata.core.memory.MemoryException; +import org.apache.carbondata.format.LocalDictionaryChunk; + +/** + * Maintains the list of encoded page of a column in a blocklet + * and encoded dictionary values only if column is encoded using local + * dictionary + * Handle the fallback if all the pages in blocklet are not + * encoded with local dictionary + */ +public class BlockletEncodedColumnPage { + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(BlockletEncodedColumnPage.class.getName()); + + /** + * list of encoded page of a column in a blocklet + */ + private List<EncodedColumnPage> encodedColumnPageList; + + /** + * fallback executor service + */ + private ExecutorService fallbackExecutorService; + + /** + * to check whether pages are local dictionary encoded or not + */ + private boolean isLocalDictEncoded; + + /** + * page level dictionary only when column is encoded with local dictionary + */ + private PageLevelDictionary pageLevelDictionary; + + /** + * fallback future task queue; + */ + private ArrayDeque<Future<FallbackEncodedColumnPage>> fallbackFutureQueue; + + BlockletEncodedColumnPage(ExecutorService fallbackExecutorService, + EncodedColumnPage encodedColumnPage) { + this.encodedColumnPageList = new ArrayList<>(); + this.fallbackExecutorService = fallbackExecutorService; + this.encodedColumnPageList.add(encodedColumnPage); + // if dimension page is local dictionary enabled and encoded with local dictionary + if (encodedColumnPage.isLocalDictionaryEnabled() && encodedColumnPage + .isLocalDictGeneratedPage()) { + this.isLocalDictEncoded = true; + // get first page dictionary + this.pageLevelDictionary = encodedColumnPage.getPageDictionary(); + } + } + + /** + * Below method will be used to add column page of a column + * + * @param encodedColumnPage + * encoded column page + * @throws ExecutionException + * failure in fallback + * @throws InterruptedException + * failure during fallback + */ + void addEncodedColumnTable(EncodedColumnPage encodedColumnPage) --- End diff -- change method name to `addEncodedColumnPage` --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2402#discussion_r197604250 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/util/ExampleUtils.scala --- @@ -96,8 +96,8 @@ object ExampleUtils { import spark.implicits._ val sc = spark.sparkContext val df = sc.parallelize(1 to numRows, 2) - .map(x => ("a", "b", x)) --- End diff -- remove the unnecessary changes --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2402#discussion_r197604461 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java --- @@ -104,19 +105,27 @@ page.setStatsCollector(KeyPageStatsCollector.newInstance(DataTypes.BYTE_ARRAY)); dictDimensionPages[tmpNumDictDimIdx++] = page; } else { + // will be encoded using string page + LocalDictionaryGenerator localDictionaryGenerator = + model.getColumnLocalDictGenMap().get(spec.getFieldName()); if (DataTypes.VARCHAR == spec.getSchemaDataType()) { - page = ColumnPage.newPage(spec, DataTypes.VARCHAR, pageSize); + page = ColumnPage.newLocalDictPage(spec, --- End diff -- if the `localDictionaryGenerator` is null, for VARCHAR also, it should not generate localDictPage, null check is missing i think --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2402#discussion_r197604074 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java --- @@ -482,7 +482,7 @@ public String getTableUniqueName() { * @return */ public boolean isLocalDictionaryEnabled() { - return isLocalDictionaryEnabled; --- End diff -- why it is always false --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2402#discussion_r197630595 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java --- @@ -482,7 +482,7 @@ public String getTableUniqueName() { * @return */ public boolean isLocalDictionaryEnabled() { - return isLocalDictionaryEnabled; --- End diff -- Currently Query part is not handled, so query will fail if local dictionary is enabled. I am working on the same --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2402#discussion_r197630599 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/LocalDictColumnPage.java --- @@ -0,0 +1,319 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.page; + +import java.io.IOException; +import java.math.BigDecimal; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.localdictionary.PageLevelDictionary; +import org.apache.carbondata.core.localdictionary.exception.DictionaryThresholdReachedException; +import org.apache.carbondata.core.localdictionary.generator.LocalDictionaryGenerator; +import org.apache.carbondata.core.util.ByteUtil; + +/** + * Column page implementation for Local dictionary generated columns + * Its a decorator over two column page + * 1. Which will hold the actual data + * 2. Which will hold the dictionary encoded data + */ +public class LocalDictColumnPage extends ColumnPage { + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(LocalDictColumnPage.class.getName()); + + /** + * to maintain page level dictionary for column page + */ + private PageLevelDictionary pageLevelDictionary; + + /** + * to hold the actual data of the column + */ + private ColumnPage actualDataColumnPage; + + /** + * to hold the dictionary encoded column page + */ + private ColumnPage encodedDataColumnPage; + + /** + * to check if actual column page memory is already clear + */ + private boolean isActualPageMemoryFreed; + + /** + * Create a new column page with input data type and page size. + */ + protected LocalDictColumnPage(ColumnPage actualDataColumnPage, ColumnPage encodedColumnpage, + LocalDictionaryGenerator localDictionaryGenerator) { + super(actualDataColumnPage.getColumnSpec(), actualDataColumnPage.getDataType(), + actualDataColumnPage.getPageSize()); + if (!localDictionaryGenerator.isThresholdReached()) { + pageLevelDictionary = new PageLevelDictionary(localDictionaryGenerator, + actualDataColumnPage.getColumnSpec().getFieldName()); + this.encodedDataColumnPage = encodedColumnpage; + } + this.actualDataColumnPage = actualDataColumnPage; + } + + @Override public byte[][] getByteArrayPage() { + if (null != pageLevelDictionary) { + return encodedDataColumnPage.getByteArrayPage(); + } else { + return actualDataColumnPage.getByteArrayPage(); + } + } + + /** + * Below method will be used to check whether page is local dictionary + * generated or not. This will be used for while enoding the the page + * + * @return + */ + public boolean isLocalDictGeneratedPage() { + return null != pageLevelDictionary; + } + + /** + * Below method will be used to add column data to page + * + * @param rowId row number + * @param bytes actual data + */ + @Override public void putBytes(int rowId, byte[] bytes) { + if (null != pageLevelDictionary) { + try { + actualDataColumnPage.putBytes(rowId, bytes); + int dictionaryValue = pageLevelDictionary.getDictionaryValue(bytes); + encodedDataColumnPage.putBytes(rowId, ByteUtil.toBytes(dictionaryValue)); + } catch (DictionaryThresholdReachedException e) { + LOGGER.error(e, "Local Dictionary threshold reached for the column: " + actualDataColumnPage + .getColumnSpec().getFieldName()); + pageLevelDictionary = null; + encodedDataColumnPage.freeMemory(); + encodedDataColumnPage = null; + } + } else { + actualDataColumnPage.putBytes(rowId, bytes); + } + } + + public PageLevelDictionary getPageDictionary() { + return pageLevelDictionary; + } + + @Override public void disableLocalDictEncoding() { + pageLevelDictionary = null; + freeEncodedColumnPage(); + } + + @Override public PageLevelDictionary getColumnPageDictionary() { --- End diff -- remove above method --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2402 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5414/ --- |
Free forum by Nabble | Edit this page |