GitHub user jackylk opened a pull request:
https://github.com/apache/carbondata/pull/971 [WIP] Refactor writer to use ColumnPage/TableStatistics/TableSpec This PR extracts interface that used for: - make ColumnPage unsafe - Encoding override You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata write Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/971.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #971 ---- commit 9523b1796abd560fe8fcec1d869d4044cf7e7449 Author: jackylk <[hidden email]> Date: 2017-05-27T18:14:48Z use ColumnPage in writer commit 4df3861cf7e3eb124ad45106185bf70f68c0c319 Author: jackylk <[hidden email]> Date: 2017-05-27T18:22:31Z remove WriterCompressModel commit 7ffd5098008847bf7fb7c7651ea8693efed65bfd Author: jackylk <[hidden email]> Date: 2017-05-28T14:55:29Z add PrimitiveColumnPage commit ce81299766e449f02f00420c26ce16bcbd915ac9 Author: jackylk <[hidden email]> Date: 2017-05-30T00:36:20Z add TableSpec ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119781493 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java --- @@ -65,20 +68,20 @@ this.pageSize = pageSize; keyColumnPage = new KeyColumnPage(pageSize, model.getSegmentProperties().getDimensionPartitions().length); - noDictDimensionPage = new VarLengthColumnPage[model.getNoDictionaryCount()]; + noDictDimensionPage = new PrimitiveColumnPage[model.getNoDictionaryCount()]; for (int i = 0; i < noDictDimensionPage.length; i++) { - noDictDimensionPage[i] = new VarLengthColumnPage(pageSize); + noDictDimensionPage[i] = new PrimitiveColumnPage(DataType.STRING, pageSize); --- End diff -- Primitive column page should only contains primitive data types, it should not contain string or decimal. So better create another page for string and decimal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119781585 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/ComplexColumnPage.java --- @@ -74,4 +75,70 @@ public void putComplexData(int rowId, int depth, List<byte[]> value) { public int getDepth() { return depth; } + + @Override + public void putInt(int rowId, int value) { --- End diff -- Create one abstract class to move this default implementations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119781704 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/ColumnPageStatistics.java --- @@ -33,30 +34,30 @@ * the unique value is the non-exist value in the row, * and will be used as storage key for null values of measures */ - private Object uniqueValue; + private Object nonExistValue; --- End diff -- I don't think we are using this now, better can be removed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119781980 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/ColumnPageStatistics.java --- @@ -114,6 +115,46 @@ private int getDecimalCount(double value) { return decimalPlaces; } + /** + * return min value as byte array + */ + public byte[] minBytes() { + return getValueAsBytes(getMin()); + } + + /** + * return max value as byte array + */ + public byte[] maxBytes() { + return getValueAsBytes(getMax()); + } + + /** + * convert value to byte array + */ + private byte[] getValueAsBytes(Object value) { + ByteBuffer b = null; + Object max = getMax(); --- End diff -- We should use `value` not `getMax()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119782441 --- Diff: LICENSE --- @@ -157,7 +157,7 @@ negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a - result of this License or out of the use or inability to use the + encodedData of this License or out of the use or inability to use the --- End diff -- I think changed by mistake --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119782797 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/measure/v3/CompressedMeasureChunkFileBasedReaderV3.java --- @@ -220,13 +223,25 @@ public CompressedMeasureChunkFileBasedReaderV3(BlockletInfo blockletInfo, String valueEncodeMeta.add(CarbonUtil .deserializeEncoderMetaNew(measureColumnChunk.getEncoder_meta().get(i).array())); } - WriterCompressModel compressionModel = CarbonUtil.getValueCompressionModel(valueEncodeMeta); - ValueCompressionHolder values = compressionModel.getValueCompressionHolder()[0]; + + MeasurePageStatistics stats = CarbonUtil.getMeasurePageStats(valueEncodeMeta); --- End diff -- it seems this code is repeated in all readers, so better move abstract class or utility --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119783237 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/MeasurePageStatistics.java --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.page.statistics; + +import org.apache.carbondata.core.datastore.page.ColumnPage; +import org.apache.carbondata.core.metadata.datatype.DataType; + +public class MeasurePageStatistics { --- End diff -- I think better give other name, it confuses. it is just holds the data like holder. so better name as VO or holder --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119783901 --- Diff: dev/scalastyle-config.xml --- @@ -193,12 +193,12 @@ This file is divided into 3 sections: </check> <check customId="awaitresult" level="error" class="org.scalastyle.file.RegexChecker" enabled="true"> - <parameters><parameter name="regex">Await\.result</parameter></parameters> + <parameters><parameter name="regex">Await\.encodedData</parameter></parameters> --- End diff -- I think changed this file by mistake --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119783933 --- Diff: docs/dml-operation-on-carbondata.md --- @@ -211,7 +211,7 @@ By default the above configuration will be false. ### Examples ``` -INSERT INTO table1 SELECT item1 ,sum(item2 + 1000) as result FROM +INSERT INTO table1 SELECT item1 ,sum(item2 + 1000) as encodedData FROM --- End diff -- I think changed this file by mistake --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119783944 --- Diff: docs/faq.md --- @@ -123,7 +123,7 @@ id city name 3 davi shenzhen ``` -As result shows, the second column is city in carbon table, but what inside is name, such as jack. This phenomenon is same with insert data into hive table. +As encodedData shows, the second column is city in carbon table, but what inside is name, such as jack. This phenomenon is same with insert data into hive table. --- End diff -- I think changed this file by mistake --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119783972 --- Diff: docs/release-guide.md --- @@ -109,7 +109,7 @@ staging repository and promote the artifacts to Maven Central. 4. Choose `User Token` from the dropdown, then click `Access User Token`. Copy a snippet of the Maven XML configuration block. 5. Insert this snippet twice into your global Maven `settings.xml` file, typically `${HOME]/ -.m2/settings.xml`. The end result should look like this, where `TOKEN_NAME` and `TOKEN_PASSWORD` +.m2/settings.xml`. The end encodedData should look like this, where `TOKEN_NAME` and `TOKEN_PASSWORD` --- End diff -- I think changed this file by mistake --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119783989 --- Diff: docs/useful-tips-on-carbondata.md --- @@ -127,7 +127,7 @@ query performance. The create table command can be modified as below : TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI', 'DICTIONARY_INCLUDE'='Dime_1,END_TIME,BEGIN_TIME'); ``` - The result of performance analysis of test-case shows reduction in query execution time from 15 to 3 seconds, thereby improving performance by nearly 5 times. + The encodedData of performance analysis of test-case shows reduction in query execution time from 15 to 3 seconds, thereby improving performance by nearly 5 times. --- End diff -- I think changed this file by mistake --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119801431 --- Diff: core/src/main/java/org/apache/carbondata/core/util/ByteUtil.java --- @@ -670,4 +671,23 @@ public static int putBytes(byte[] tgtBytes, int tgtOffset, byte[] srcBytes, int System.arraycopy(srcBytes, srcOffset, tgtBytes, tgtOffset, srcLength); return tgtOffset + srcLength; } + + /** + * flatten the byte[][] to byte[] and return data after applying compression by compressor + * @param compressor compressor to use + * @return compressed data + */ + public static byte[] flattenAndCompress(Compressor compressor, byte[][] byteArrayData) { --- End diff -- Since it is ByteUtil, better use only flatten not compress, compress can be done in caller method --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119802447 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java --- @@ -549,24 +562,23 @@ private static ValueEncoderMeta deserializeValueEncoderMeta(ByteBuffer byteBuffe } - private static WriterCompressModel getValueCompressionModel(ValueEncoderMeta[] encoderMetas) { - Object[] maxValue = new Object[encoderMetas.length]; - Object[] minValue = new Object[encoderMetas.length]; - int[] decimalLength = new int[encoderMetas.length]; - Object[] uniqueValue = new Object[encoderMetas.length]; - DataType[] aggType = new DataType[encoderMetas.length]; + private static MeasurePageStatistics getMeasurePageStats(ValueEncoderMeta[] encoderMetas) { + Object[] max = new Object[encoderMetas.length]; + Object[] min = new Object[encoderMetas.length]; + int[] decimal = new int[encoderMetas.length]; + Object[] nonExistValue = new Object[encoderMetas.length]; + DataType[] types = new DataType[encoderMetas.length]; byte[] dataTypeSelected = new byte[encoderMetas.length]; for (int i = 0; i < encoderMetas.length; i++) { - maxValue[i] = encoderMetas[i].getMaxValue(); - minValue[i] = encoderMetas[i].getMinValue(); - decimalLength[i] = encoderMetas[i].getDecimal(); - uniqueValue[i] = encoderMetas[i].getUniqueValue(); - aggType[i] = encoderMetas[i].getType(); + max[i] = encoderMetas[i].getMaxValue(); + min[i] = encoderMetas[i].getMinValue(); + decimal[i] = encoderMetas[i].getDecimal(); + nonExistValue[i] = encoderMetas[i].getUniqueValue(); + types[i] = encoderMetas[i].getType(); dataTypeSelected[i] = encoderMetas[i].getDataTypeSelected(); } - return ValueCompressionUtil - .getWriterCompressModel(maxValue, minValue, decimalLength, uniqueValue, aggType, - dataTypeSelected); + + return MeasurePageStatistics.build(min, max, nonExistValue, decimal, types, dataTypeSelected); --- End diff -- Better pass `ValueEncoderMeta` list to `build` method to avoid duplicate code in other places --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119802768 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -841,17 +837,14 @@ public static WriterCompressModel getValueCompressionModel( */ for (int i = 0; i < dataTypeSelected.length; i++) { // always 1 ValueEncoderMeta valueEncoderMeta = encodeMetaList.get(i); - maxValue[i] = valueEncoderMeta.getMaxValue(); - minValue[i] = valueEncoderMeta.getMinValue(); - uniqueValue[i] = valueEncoderMeta.getUniqueValue(); + max[i] = valueEncoderMeta.getMaxValue(); + min[i] = valueEncoderMeta.getMinValue(); + nonExistValue[i] = valueEncoderMeta.getUniqueValue(); decimal[i] = valueEncoderMeta.getDecimal(); type[i] = valueEncoderMeta.getType(); dataTypeSelected[i] = valueEncoderMeta.getDataTypeSelected(); } - MeasureMetaDataModel measureMetadataModel = - new MeasureMetaDataModel(minValue, maxValue, decimal, dataTypeSelected.length, uniqueValue, - type, dataTypeSelected); - return ValueCompressionUtil.getWriterCompressModel(measureMetadataModel); + return MeasurePageStatistics.build(min, max, nonExistValue, decimal, type, dataTypeSelected); --- End diff -- Better pass `ValueEncoderMeta` list to `build` method to avoid duplicate code in other places --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119803990 --- Diff: core/src/test/java/org/apache/carbondata/core/datastore/chunk/reader/measure/CompressedMeasureChunkFileBasedReaderTest.java --- @@ -1,90 +0,0 @@ -/* --- End diff -- Better remove the file --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119804061 --- Diff: dev/findbugs-exclude.xml --- @@ -143,7 +143,7 @@ This method returns a value that is not checked. The return value should be checked since it can indicate an unusual or unexpected function execution. For example, the File.delete() method returns false if the file could not be successfully deleted - (rather than throwing an Exception). If you don't check the result, you won't notice + (rather than throwing an Exception). If you don't check the encodedData, you won't notice --- End diff -- I think added by mistake --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119885006 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java --- @@ -65,20 +68,20 @@ this.pageSize = pageSize; keyColumnPage = new KeyColumnPage(pageSize, model.getSegmentProperties().getDimensionPartitions().length); - noDictDimensionPage = new VarLengthColumnPage[model.getNoDictionaryCount()]; + noDictDimensionPage = new PrimitiveColumnPage[model.getNoDictionaryCount()]; for (int i = 0; i < noDictDimensionPage.length; i++) { - noDictDimensionPage[i] = new VarLengthColumnPage(pageSize); + noDictDimensionPage[i] = new PrimitiveColumnPage(DataType.STRING, pageSize); --- End diff -- If create another class for string and decimal, then it will be more complex for measure handling. And actually in future I think all simple column handling should be similar, so better to keep them in one class. I think a better class name can solve your comment? How about SimpleColumnPage and ComplexColumnPage? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/971#discussion_r119885662 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/ComplexColumnPage.java --- @@ -74,4 +75,70 @@ public void putComplexData(int rowId, int depth, List<byte[]> value) { public int getDepth() { return depth; } + + @Override + public void putInt(int rowId, int value) { --- End diff -- How about add default implementation in ColumnPage and override the one that is needed in ComplexColumnPage --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |