GitHub user jackylk opened a pull request:
https://github.com/apache/carbondata/pull/1136 [WIP] Support encoding strategy for dimension columns You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata dimstrategy Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1136.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1136 ---- commit 382a53b93858b3443d15f69a5ba3715a9a0a91b4 Author: jackylk <[hidden email]> Date: 2017-07-04T00:12:13Z support column page in writer commit febd0b87f5484010f609e6415d7f508b41480ef2 Author: jackylk <[hidden email]> Date: 2017-07-04T15:26:54Z add encoding strategy for dimension ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2910/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/324/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2916/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/330/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2922/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/336/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3058/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/469/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3059/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/470/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/564/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3158/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/565/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3159/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/571/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1136 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3166/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1136#discussion_r128908712 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/TableSpec.java --- @@ -25,197 +25,142 @@ public class TableSpec { - // contains name and type for each dimension - private DimensionSpec dimensionSpec; - // contains name and type for each measure - private MeasureSpec measureSpec; + // column spec for each dimension and measure + private DimensionSpec[] dimensionSpec; + private MeasureSpec[] measureSpec; - public TableSpec(List<CarbonDimension> dimensions, List<CarbonMeasure> measures) { - dimensionSpec = new DimensionSpec(dimensions); - measureSpec = new MeasureSpec(measures); - } + // number of simple dimensions + private int numSimpleDimensions; - public DimensionSpec getDimensionSpec() { - return dimensionSpec; - } - - public MeasureSpec getMeasureSpec() { - return measureSpec; - } - - public class DimensionSpec { - - // field name of each dimension, in schema order - private String[] fieldName; - - // encoding type of each dimension, in schema order - private DimensionType[] types; - - // number of simple dimensions - private int numSimpleDimensions; - - // number of complex dimensions - private int numComplexDimensions; - - // number of dimensions after complex column expansion - private int numDimensionExpanded; - - DimensionSpec(List<CarbonDimension> dimensions) { - // first calculate total number of columnar field considering column group and complex column - numDimensionExpanded = 0; - numSimpleDimensions = 0; - numComplexDimensions = 0; - boolean inColumnGroup = false; - for (CarbonDimension dimension : dimensions) { - if (dimension.isColumnar()) { - if (inColumnGroup) { - inColumnGroup = false; - } - if (dimension.isComplex()) { - numDimensionExpanded += dimension.getNumDimensionsExpanded(); - numComplexDimensions++; - } else { - numDimensionExpanded++; - numSimpleDimensions++; - } - } else { - // column group - if (!inColumnGroup) { - inColumnGroup = true; - numDimensionExpanded++; - numSimpleDimensions++; - } + public TableSpec(List<CarbonDimension> dimensions, List<CarbonMeasure> measures) { + // first calculate total number of columnar field considering column group and complex column + numSimpleDimensions = 0; + for (CarbonDimension dimension : dimensions) { + if (dimension.isColumnar()) { + if (!dimension.isComplex()) { + numSimpleDimensions++; } + } else { + throw new UnsupportedOperationException("column group is not supported"); } + } + dimensionSpec = new DimensionSpec[dimensions.size()]; + measureSpec = new MeasureSpec[measures.size()]; + addDimensions(dimensions); + addMeasures(measures); + } - // then extract dimension name and type for each column - fieldName = new String[numDimensionExpanded]; - types = new DimensionType[numDimensionExpanded]; - inColumnGroup = false; - int index = 0; - for (CarbonDimension dimension : dimensions) { - if (dimension.isColumnar()) { - if (inColumnGroup) { - inColumnGroup = false; - } - if (dimension.isComplex()) { - int count = addDimension(index, dimension); - index += count; - } else if (dimension.getDataType() == DataType.TIMESTAMP || - dimension.getDataType() == DataType.DATE) { - addSimpleDimension(index++, dimension.getColName(), DimensionType.DIRECT_DICTIONARY); - } else if (dimension.isGlobalDictionaryEncoding()) { - addSimpleDimension(index++, dimension.getColName(), DimensionType.GLOBAL_DICTIONARY); - } else { - addSimpleDimension(index++, dimension.getColName(), DimensionType.PLAIN_VALUE); - } + private void addDimensions(List<CarbonDimension> dimensions) { + int dimIndex = 0; + for (int i = 0; i < dimensions.size(); i++) { + CarbonDimension dimension = dimensions.get(i); + if (dimension.isColumnar()) { + if (dimension.isComplex()) { + DimensionSpec spec = new DimensionSpec(DimensionType.COMPLEX, dimension); + dimensionSpec[dimIndex++] = spec; + } else if (dimension.getDataType() == DataType.TIMESTAMP || --- End diff -- it is better to check the encoding types rather than datatype. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1136#discussion_r128908871 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DictDimensionIndexCodec.java --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.page.encoding; + +import org.apache.carbondata.core.datastore.DimensionType; +import org.apache.carbondata.core.datastore.columnar.BlockIndexerStorageForInt; +import org.apache.carbondata.core.datastore.columnar.BlockIndexerStorageForNoInvertedIndexForInt; +import org.apache.carbondata.core.datastore.columnar.BlockIndexerStorageForNoInvertedIndexForShort; +import org.apache.carbondata.core.datastore.columnar.BlockIndexerStorageForShort; +import org.apache.carbondata.core.datastore.columnar.IndexStorage; +import org.apache.carbondata.core.datastore.compression.Compressor; +import org.apache.carbondata.core.datastore.page.ColumnPage; +import org.apache.carbondata.core.memory.MemoryException; +import org.apache.carbondata.core.metadata.ColumnarFormatVersion; +import org.apache.carbondata.core.util.ByteUtil; + +public class DictDimensionIndexCodec extends IndexStorageCodec { + + DictDimensionIndexCodec(boolean isSort, boolean isInvertedIndex, Compressor compressor) { + super(isSort, isInvertedIndex, compressor); + } + + @Override + public String getName() { + return "DictDimensionIndexCodec"; + } + + @Override + public EncodedColumnPage encode(ColumnPage input) { + IndexStorage indexStorage; + byte[][] data = input.getByteArrayPage(); + if (isInvertedIndex) { + if (version == ColumnarFormatVersion.V3) { + indexStorage = new BlockIndexerStorageForShort(data, true, false, isSort); + } else { + indexStorage = new BlockIndexerStorageForInt(data, true, false, isSort); + } + } else { + if (version == ColumnarFormatVersion.V3) { + indexStorage = new BlockIndexerStorageForNoInvertedIndexForShort(data, false); + } else { + indexStorage = new BlockIndexerStorageForNoInvertedIndexForInt(data); + } + } + byte[] flattened = ByteUtil.flatten(indexStorage.getDataPage()); + byte[] compressed = compressor.compressByte(flattened); + return new EncodedDimensionPage(input.getPageSize(), compressed, indexStorage, + DimensionType.GLOBAL_DICTIONARY); + } + + @Override + public ColumnPage decode(byte[] input, int offset, int length) throws MemoryException { --- End diff -- I think you are going to implement in future. May be at this point move this to `IndexStorageCodec` and throw UnsupportedException. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1136#discussion_r128918118 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DictDimensionIndexCodec.java --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.page.encoding; + +import org.apache.carbondata.core.datastore.DimensionType; +import org.apache.carbondata.core.datastore.columnar.BlockIndexerStorageForInt; +import org.apache.carbondata.core.datastore.columnar.BlockIndexerStorageForNoInvertedIndexForInt; +import org.apache.carbondata.core.datastore.columnar.BlockIndexerStorageForNoInvertedIndexForShort; +import org.apache.carbondata.core.datastore.columnar.BlockIndexerStorageForShort; +import org.apache.carbondata.core.datastore.columnar.IndexStorage; +import org.apache.carbondata.core.datastore.compression.Compressor; +import org.apache.carbondata.core.datastore.page.ColumnPage; +import org.apache.carbondata.core.memory.MemoryException; +import org.apache.carbondata.core.metadata.ColumnarFormatVersion; +import org.apache.carbondata.core.util.ByteUtil; + +public class DictDimensionIndexCodec extends IndexStorageCodec { + + DictDimensionIndexCodec(boolean isSort, boolean isInvertedIndex, Compressor compressor) { + super(isSort, isInvertedIndex, compressor); + } + + @Override + public String getName() { + return "DictDimensionIndexCodec"; + } + + @Override + public EncodedColumnPage encode(ColumnPage input) { + IndexStorage indexStorage; + byte[][] data = input.getByteArrayPage(); + if (isInvertedIndex) { + if (version == ColumnarFormatVersion.V3) { + indexStorage = new BlockIndexerStorageForShort(data, true, false, isSort); + } else { + indexStorage = new BlockIndexerStorageForInt(data, true, false, isSort); + } + } else { + if (version == ColumnarFormatVersion.V3) { + indexStorage = new BlockIndexerStorageForNoInvertedIndexForShort(data, false); + } else { + indexStorage = new BlockIndexerStorageForNoInvertedIndexForInt(data); + } + } + byte[] flattened = ByteUtil.flatten(indexStorage.getDataPage()); + byte[] compressed = compressor.compressByte(flattened); + return new EncodedDimensionPage(input.getPageSize(), compressed, indexStorage, + DimensionType.GLOBAL_DICTIONARY); + } + + @Override + public ColumnPage decode(byte[] input, int offset, int length) throws MemoryException { --- End diff -- I have not modified any logic in read part so that backward compatibility is ensured. This need to be done in future PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |