Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #971: [WIP] Refactor writer to use ColumnPage/TableS...

Classic

List

68 messages Options

Options

1234

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119886349

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/ColumnPageStatistics.java ---
@@ -33,30 +34,30 @@
* the unique value is the non-exist value in the row,
* and will be used as storage key for null values of measures
*/
- private Object uniqueValue;
+ private Object nonExistValue;
--- End diff --

But it will be written into file by `CarbonMetadataUtil.serializeEncodeMetaUsingByteBuffer`, you mean read path is not using it?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119886685

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/ColumnPageStatistics.java ---
@@ -114,6 +115,46 @@ private int getDecimalCount(double value) {
return decimalPlaces;
}

+ /**
+ * return min value as byte array
+ */
+ public byte[] minBytes() {
+ return getValueAsBytes(getMin());
+ }
+
+ /**
+ * return max value as byte array
+ */
+ public byte[] maxBytes() {
+ return getValueAsBytes(getMax());
+ }
+
+ /**
+ * convert value to byte array
+ */
+ private byte[] getValueAsBytes(Object value) {
+ ByteBuffer b = null;
+ Object max = getMax();
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119887373

--- Diff: dev/scalastyle-config.xml ---
@@ -193,12 +193,12 @@ This file is divided into 3 sections:
</check>

<check customId="awaitresult" level="error" class="org.scalastyle.file.RegexChecker" enabled="true">
- <parameters><parameter name="regex">Await\.result</parameter></parameters>
+ <parameters><parameter name="regex">Await\.encodedData</parameter></parameters>
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119887399

--- Diff: LICENSE ---
@@ -157,7 +157,7 @@
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
- result of this License or out of the use or inability to use the
+ encodedData of this License or out of the use or inability to use the
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119887721

--- Diff: docs/faq.md ---
@@ -123,7 +123,7 @@ id city name
3 davi shenzhen
```

-As result shows, the second column is city in carbon table, but what inside is name, such as jack. This phenomenon is same with insert data into hive table.
+As encodedData shows, the second column is city in carbon table, but what inside is name, such as jack. This phenomenon is same with insert data into hive table.
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119887697

--- Diff: docs/dml-operation-on-carbondata.md ---
@@ -211,7 +211,7 @@ By default the above configuration will be false.

### Examples
```
-INSERT INTO table1 SELECT item1 ,sum(item2 + 1000) as result FROM
+INSERT INTO table1 SELECT item1 ,sum(item2 + 1000) as encodedData FROM
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119887844

--- Diff: docs/release-guide.md ---
@@ -109,7 +109,7 @@ staging repository and promote the artifacts to Maven Central.
4. Choose `User Token` from the dropdown, then click `Access User Token`. Copy a snippet of the
Maven XML configuration block.
5. Insert this snippet twice into your global Maven `settings.xml` file, typically `${HOME]/
-.m2/settings.xml`. The end result should look like this, where `TOKEN_NAME` and `TOKEN_PASSWORD`
+.m2/settings.xml`. The end encodedData should look like this, where `TOKEN_NAME` and `TOKEN_PASSWORD`
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119887960

--- Diff: docs/useful-tips-on-carbondata.md ---
@@ -127,7 +127,7 @@ query performance. The create table command can be modified as below :
TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI',
'DICTIONARY_INCLUDE'='Dime_1,END_TIME,BEGIN_TIME');
```
- The result of performance analysis of test-case shows reduction in query execution time from 15 to 3 seconds, thereby improving performance by nearly 5 times.
+ The encodedData of performance analysis of test-case shows reduction in query execution time from 15 to 3 seconds, thereby improving performance by nearly 5 times.
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119888798

--- Diff: core/src/main/java/org/apache/carbondata/core/util/ByteUtil.java ---
@@ -670,4 +671,23 @@ public static int putBytes(byte[] tgtBytes, int tgtOffset, byte[] srcBytes, int
System.arraycopy(srcBytes, srcOffset, tgtBytes, tgtOffset, srcLength);
return tgtOffset + srcLength;
}
+
+ /**
+ * flatten the byte[][] to byte[] and return data after applying compression by compressor
+ * @param compressor compressor to use
+ * @return compressed data
+ */
+ public static byte[] flattenAndCompress(Compressor compressor, byte[][] byteArrayData) {
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119889985

--- Diff: core/src/test/java/org/apache/carbondata/core/datastore/chunk/reader/measure/CompressedMeasureChunkFileBasedReaderTest.java ---
@@ -1,90 +0,0 @@
-/*
--- End diff --

ok. And I will remove `CompressedDimensionChunkFileBasedReaderTest` also

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119890152

--- Diff: dev/findbugs-exclude.xml ---
@@ -143,7 +143,7 @@
This method returns a value that is not checked. The return value should be checked since
it can indicate an unusual or unexpected function execution. For example, the
File.delete() method returns false if the file could not be successfully deleted
- (rather than throwing an Exception). If you don't check the result, you won't notice
+ (rather than throwing an Exception). If you don't check the encodedData, you won't notice
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119895320

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/measure/v3/CompressedMeasureChunkFileBasedReaderV3.java ---
@@ -220,13 +223,25 @@ public CompressedMeasureChunkFileBasedReaderV3(BlockletInfo blockletInfo, String
valueEncodeMeta.add(CarbonUtil
.deserializeEncoderMetaNew(measureColumnChunk.getEncoder_meta().get(i).array()));
}
- WriterCompressModel compressionModel = CarbonUtil.getValueCompressionModel(valueEncodeMeta);
- ValueCompressionHolder values = compressionModel.getValueCompressionHolder()[0];
+
+ MeasurePageStatistics stats = CarbonUtil.getMeasurePageStats(valueEncodeMeta);
--- End diff --

This code is repeated in CompressedMeasureChunkFileBasedReaderV2 and CompressedMeasureChunkFileBasedReaderV3, if we move it to a shared function, a result object need to be created, since multiple results are needed (stats, convertedType, values).

Actually in #987 , usage of `CompressionFinder` is removed in write path, I think another PR is needed to remove it for read path also, after this is done, `CompressionFinder` related class and the whole `org.apache.carbondata.core.datastore.compression` package.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119905806

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/MeasurePageStatistics.java ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.page.statistics;
+
+import org.apache.carbondata.core.datastore.page.ColumnPage;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+
+public class MeasurePageStatistics {
--- End diff --

ok, change to `MeasurePageStatsVO`

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119909710

--- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java ---
@@ -549,24 +562,23 @@ private static ValueEncoderMeta deserializeValueEncoderMeta(ByteBuffer byteBuffe

}

- private static WriterCompressModel getValueCompressionModel(ValueEncoderMeta[] encoderMetas) {
- Object[] maxValue = new Object[encoderMetas.length];
- Object[] minValue = new Object[encoderMetas.length];
- int[] decimalLength = new int[encoderMetas.length];
- Object[] uniqueValue = new Object[encoderMetas.length];
- DataType[] aggType = new DataType[encoderMetas.length];
+ private static MeasurePageStatistics getMeasurePageStats(ValueEncoderMeta[] encoderMetas) {
+ Object[] max = new Object[encoderMetas.length];
+ Object[] min = new Object[encoderMetas.length];
+ int[] decimal = new int[encoderMetas.length];
+ Object[] nonExistValue = new Object[encoderMetas.length];
+ DataType[] types = new DataType[encoderMetas.length];
byte[] dataTypeSelected = new byte[encoderMetas.length];
for (int i = 0; i < encoderMetas.length; i++) {
- maxValue[i] = encoderMetas[i].getMaxValue();
- minValue[i] = encoderMetas[i].getMinValue();
- decimalLength[i] = encoderMetas[i].getDecimal();
- uniqueValue[i] = encoderMetas[i].getUniqueValue();
- aggType[i] = encoderMetas[i].getType();
+ max[i] = encoderMetas[i].getMaxValue();
+ min[i] = encoderMetas[i].getMinValue();
+ decimal[i] = encoderMetas[i].getDecimal();
+ nonExistValue[i] = encoderMetas[i].getUniqueValue();
+ types[i] = encoderMetas[i].getType();
dataTypeSelected[i] = encoderMetas[i].getDataTypeSelected();
}
- return ValueCompressionUtil
- .getWriterCompressModel(maxValue, minValue, decimalLength, uniqueValue, aggType,
- dataTypeSelected);
+
+ return MeasurePageStatistics.build(min, max, nonExistValue, decimal, types, dataTypeSelected);
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #971: [CARBONDATA-1015] Extract interface in data lo...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/971#discussion_r119909916

--- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -841,17 +837,14 @@ public static WriterCompressModel getValueCompressionModel(
*/
for (int i = 0; i < dataTypeSelected.length; i++) { // always 1
ValueEncoderMeta valueEncoderMeta = encodeMetaList.get(i);
- maxValue[i] = valueEncoderMeta.getMaxValue();
- minValue[i] = valueEncoderMeta.getMinValue();
- uniqueValue[i] = valueEncoderMeta.getUniqueValue();
+ max[i] = valueEncoderMeta.getMaxValue();
+ min[i] = valueEncoderMeta.getMinValue();
+ nonExistValue[i] = valueEncoderMeta.getUniqueValue();
decimal[i] = valueEncoderMeta.getDecimal();
type[i] = valueEncoderMeta.getType();
dataTypeSelected[i] = valueEncoderMeta.getDataTypeSelected();
}
- MeasureMetaDataModel measureMetadataModel =
- new MeasureMetaDataModel(minValue, maxValue, decimal, dataTypeSelected.length, uniqueValue,
- type, dataTypeSelected);
- return ValueCompressionUtil.getWriterCompressModel(measureMetadataModel);
+ return MeasurePageStatistics.build(min, max, nonExistValue, decimal, type, dataTypeSelected);
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #971: [CARBONDATA-1015] Extract interface in data load writ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/971

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2158/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #971: [CARBONDATA-1015] Extract interface in data load writ...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/971

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/23/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #971: [CARBONDATA-1015] Extract interface in data load writ...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/971

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-2.1/52/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #971: [CARBONDATA-1015] Extract interface in data load writ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/971

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2160/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #971: [CARBONDATA-1015] Extract interface in data load writ...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/971

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/25/<h2>Failed Tests: <span class='status-failure'>5</span></h2><h3><a name='carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test' /><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/25/org.apache.carbondata$carbondata-spark-common-test/testReport'>carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test</a>: <span class='status-failure'>5</span></h3><ul><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/25/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.integration.spark.testsuite.dataload/TestLoadDataWithSinglePass/test_data_loading_use_one_pass/'><strong>org.apache.carbondata.integration.spark.testsuite.dataload.TestLoadDataWithSinglePass.test data loading use one pass</strong></a></li><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/25/org.apache.carbondata$carbondata-spark-common-test/testReport/org.ap
ache.carbondata.integration.spark.testsuite.dataload/TestLoadDataWithSinglePass/test_data_loading_use_one_pass_when_offer_column_dictionary_file/'><strong>org.apache.carbondata.integration.spark.testsuite.dataload.TestLoadDataWithSinglePass.test data loading use one pass when offer column dictionary file</strong></a></li><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/25/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.integration.spark.testsuite.dataload/TestLoadDataWithSinglePass/test_data_loading_use_one_pass_when_do_incremental_load/'><strong>org.apache.carbondata.integration.spark.testsuite.dataload.TestLoadDataWithSinglePass.test data loading use one pass when do incremental load</strong></a></li><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/25/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.spark.testsuite.allqueries/InsertIntoCarbonTableTestCase/insert_from_hive_sum
_expression/'><strong>org.apache.carbondata.spark.testsuite.allqueries.InsertIntoCarbonTableTestCase.insert from hive-sum expression</strong></a></li><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/25/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.spark.testsuite.allqueries/InsertIntoCarbonTableTestCase/insert_into_carbon_table_from_carbon_table_union_query/'><strong>org.apache.carbondata.spark.testsuite.allqueries.InsertIntoCarbonTableTestCase.insert into carbon table from carbon table union query</strong></a></li></ul>

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

1234