GitHub user ravipesala opened a pull request:
https://github.com/apache/carbondata/pull/1117 [CARBONDATA-757] Big decimal optimization Currently Decimal is converted to bytes and using LV (length + value) format to write to store. And while getting back read the bytes in LV format and convert back the bigdecimal. We can do following operations to improve storage and processing. 1. if decimal precision is less than 9 then we can fit in int (4 bytes) 2. if decimal precision is less than 18 then we can fit in long (8 bytes) 3. if decimal precision is more than 18 then we can fit in fixed length bytes(the length bytes can vary depends on precision but it is always fixed length) So in this approach we no need store bigdecimal in LV format, we can store in fixed format.It reduces the memory. Carbondata format changes -> Added fixedLength in datachunk to know about the column length of big decimal. This attribute can be used in case of char(fixedlength) or varchar(fixedlength) datatypes as well. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata big-decimal-optim Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1117.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1117 ---- commit 061fc25f682645033f1769499de2655e405807f4 Author: ravipesala <[hidden email]> Date: 2017-06-28T10:33:03Z Optimizing decimal datatype commit 6f050e74d857773e6e4468e459320670a64c4188 Author: ravipesala <[hidden email]> Date: 2017-06-29T11:26:14Z Optimized big decimal to use less space ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1117 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2795/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1117 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/216/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit commented on the issue:
https://github.com/apache/carbondata/pull/1117 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/carbondata-pr-spark-1.6/731/<h2>Failed Tests: <span class='status-failure'>4</span></h2><h3><a name='carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test' /><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/731/org.apache.carbondata$carbondata-spark-common-test/testReport'>carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test</a>: <span class='status-failure'>4</span></h3><ul><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/731/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.spark.testsuite.partition/TestAllDataTypeForPartitionTable/allTypeTable_hash_smallInt/'><strong>org.apache.carbondata.spark.testsuite.partition.TestAllDataTypeForPartitionTable.allTypeTable_hash_smallInt</strong></a></li><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/731/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondat a.spark.testsuite.partition/TestAllDataTypeForPartitionTable/allTypeTable_hash_varchar/'><strong>org.apache.carbondata.spark.testsuite.partition.TestAllDataTypeForPartitionTable.allTypeTable_hash_varchar</strong></a></li><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/731/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.spark.testsuite.partition/TestAllDataTypeForPartitionTable/allTypeTable_list_date/'><strong>org.apache.carbondata.spark.testsuite.partition.TestAllDataTypeForPartitionTable.allTypeTable_list_date</strong></a></li><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/731/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.spark.testsuite.partition/TestAllDataTypeForPartitionTable/allTypeTable_range_string/'><strong>org.apache.carbondata.spark.testsuite.partition.TestAllDataTypeForPartitionTable.allTypeTable_range_string</strong></a></li></ul> --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1117 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/239/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1117 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2819/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1117 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2820/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1117 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/240/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1117#discussion_r125271988 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/SafeFixLengthColumnPage.java --- @@ -99,6 +99,14 @@ public void putBytes(int rowId, byte[] bytes, int offset, int length) { throw new UnsupportedOperationException("invalid data type: " + dataType); } + @Override public void putDecimal(int rowId, BigDecimal decimal) { --- End diff -- move `@Override` to previous line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1117#discussion_r125548213 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java --- @@ -83,7 +84,59 @@ public void setByteArrayPage(byte[][] byteArray) { /** * Create a new column page based on the LV (Length Value) encoded bytes */ - static ColumnPage newDecimalColumnPage(byte[] lvEncodedBytes) throws MemoryException { + static ColumnPage newDecimalColumnPage(byte[] lvEncodedBytes, int scale, int precision) + throws MemoryException { + DecimalConverterFactory.DecimalConverter decimalConverter = + DecimalConverterFactory.INSTANCE.getDecimalConverter(precision, scale); + int size = decimalConverter.getSize(); + if (size < 0) { + return getLegacyColumnPage(lvEncodedBytes, scale, precision, DataType.DECIMAL); + } else { + // Here the size is always fixed. + return getDecimalColumnPage(lvEncodedBytes, scale, precision, size); + } + } + + /** + * Create a new column page based on the LV (Length Value) encoded bytes + */ + static ColumnPage newVarLengthColumnPage(byte[] lvEncodedBytes, int scale, int precision) + throws MemoryException { + return getLegacyColumnPage(lvEncodedBytes, scale, precision, DataType.BYTE_ARRAY); --- End diff -- I think it is better to rename `LegacyColumnPage` to 'LVBytesColumnPage`. It is hard to know what is legacy --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1117#discussion_r125548330 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java --- @@ -73,54 +82,58 @@ public int getPageSize() { return pageSize; } - private static ColumnPage createVarLengthPage(DataType dataType, int pageSize) { + private static ColumnPage createVarLengthPage(DataType dataType, int pageSize, int scale, + int precision) { if (unsafe) { try { - return new UnsafeVarLengthColumnPage(dataType, pageSize); + return new UnsafeVarLengthColumnPage(dataType, pageSize, scale, precision); } catch (MemoryException e) { throw new RuntimeException(e); } } else { - return new SafeVarLengthColumnPage(dataType, pageSize); + return new SafeVarLengthColumnPage(dataType, pageSize, scale, precision); } } - private static ColumnPage createFixLengthPage(DataType dataType, int pageSize) { + private static ColumnPage createFixLengthPage(DataType dataType, int pageSize, int scale, + int precision) { if (unsafe) { try { - return new UnsafeFixLengthColumnPage(dataType, pageSize); + return new UnsafeFixLengthColumnPage(dataType, pageSize, scale, precision); } catch (MemoryException e) { throw new RuntimeException(e); } } else { - return new SafeFixLengthColumnPage(dataType, pageSize); + return new SafeFixLengthColumnPage(dataType, pageSize, scale, pageSize); } } - private static ColumnPage createPage(DataType dataType, int pageSize) { + private static ColumnPage createPage(DataType dataType, int pageSize, int scale, int precision) { if (dataType.equals(BYTE_ARRAY) | dataType.equals(DECIMAL)) { - return createVarLengthPage(dataType, pageSize); + return createVarLengthPage(dataType, pageSize, scale, precision); } else { - return createFixLengthPage(dataType, pageSize); + return createFixLengthPage(dataType, pageSize, scale, precision); } } - public static ColumnPage newVarLengthPath(DataType dataType, int pageSize) { + public static ColumnPage newVarLengthPath(DataType dataType, int pageSize, int scale, --- End diff -- can you create another function of `newVarLengthPage` which accept `DataType dataType, int pageSize` only and pass -1, -1 to this function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1117#discussion_r125548362 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java --- @@ -73,54 +82,58 @@ public int getPageSize() { return pageSize; } - private static ColumnPage createVarLengthPage(DataType dataType, int pageSize) { + private static ColumnPage createVarLengthPage(DataType dataType, int pageSize, int scale, + int precision) { if (unsafe) { try { - return new UnsafeVarLengthColumnPage(dataType, pageSize); + return new UnsafeVarLengthColumnPage(dataType, pageSize, scale, precision); } catch (MemoryException e) { throw new RuntimeException(e); } } else { - return new SafeVarLengthColumnPage(dataType, pageSize); + return new SafeVarLengthColumnPage(dataType, pageSize, scale, precision); } } - private static ColumnPage createFixLengthPage(DataType dataType, int pageSize) { + private static ColumnPage createFixLengthPage(DataType dataType, int pageSize, int scale, + int precision) { if (unsafe) { try { - return new UnsafeFixLengthColumnPage(dataType, pageSize); + return new UnsafeFixLengthColumnPage(dataType, pageSize, scale, precision); } catch (MemoryException e) { throw new RuntimeException(e); } } else { - return new SafeFixLengthColumnPage(dataType, pageSize); + return new SafeFixLengthColumnPage(dataType, pageSize, scale, pageSize); } } - private static ColumnPage createPage(DataType dataType, int pageSize) { + private static ColumnPage createPage(DataType dataType, int pageSize, int scale, int precision) { if (dataType.equals(BYTE_ARRAY) | dataType.equals(DECIMAL)) { - return createVarLengthPage(dataType, pageSize); + return createVarLengthPage(dataType, pageSize, scale, precision); } else { - return createFixLengthPage(dataType, pageSize); + return createFixLengthPage(dataType, pageSize, scale, precision); } } - public static ColumnPage newVarLengthPath(DataType dataType, int pageSize) { + public static ColumnPage newVarLengthPath(DataType dataType, int pageSize, int scale, --- End diff -- And please correct the function name, it should be `newVarLengthPage` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1117#discussion_r126456038 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java --- @@ -73,54 +82,58 @@ public int getPageSize() { return pageSize; } - private static ColumnPage createVarLengthPage(DataType dataType, int pageSize) { + private static ColumnPage createVarLengthPage(DataType dataType, int pageSize, int scale, + int precision) { if (unsafe) { try { - return new UnsafeVarLengthColumnPage(dataType, pageSize); + return new UnsafeVarLengthColumnPage(dataType, pageSize, scale, precision); } catch (MemoryException e) { throw new RuntimeException(e); } } else { - return new SafeVarLengthColumnPage(dataType, pageSize); + return new SafeVarLengthColumnPage(dataType, pageSize, scale, precision); } } - private static ColumnPage createFixLengthPage(DataType dataType, int pageSize) { + private static ColumnPage createFixLengthPage(DataType dataType, int pageSize, int scale, + int precision) { if (unsafe) { try { - return new UnsafeFixLengthColumnPage(dataType, pageSize); + return new UnsafeFixLengthColumnPage(dataType, pageSize, scale, precision); } catch (MemoryException e) { throw new RuntimeException(e); } } else { - return new SafeFixLengthColumnPage(dataType, pageSize); + return new SafeFixLengthColumnPage(dataType, pageSize, scale, pageSize); } } - private static ColumnPage createPage(DataType dataType, int pageSize) { + private static ColumnPage createPage(DataType dataType, int pageSize, int scale, int precision) { if (dataType.equals(BYTE_ARRAY) | dataType.equals(DECIMAL)) { - return createVarLengthPage(dataType, pageSize); + return createVarLengthPage(dataType, pageSize, scale, precision); } else { - return createFixLengthPage(dataType, pageSize); + return createFixLengthPage(dataType, pageSize, scale, precision); } } - public static ColumnPage newVarLengthPath(DataType dataType, int pageSize) { + public static ColumnPage newVarLengthPath(DataType dataType, int pageSize, int scale, --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1117#discussion_r126456370 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java --- @@ -83,7 +84,59 @@ public void setByteArrayPage(byte[][] byteArray) { /** * Create a new column page based on the LV (Length Value) encoded bytes */ - static ColumnPage newDecimalColumnPage(byte[] lvEncodedBytes) throws MemoryException { + static ColumnPage newDecimalColumnPage(byte[] lvEncodedBytes, int scale, int precision) + throws MemoryException { + DecimalConverterFactory.DecimalConverter decimalConverter = + DecimalConverterFactory.INSTANCE.getDecimalConverter(precision, scale); + int size = decimalConverter.getSize(); + if (size < 0) { + return getLegacyColumnPage(lvEncodedBytes, scale, precision, DataType.DECIMAL); + } else { + // Here the size is always fixed. + return getDecimalColumnPage(lvEncodedBytes, scale, precision, size); + } + } + + /** + * Create a new column page based on the LV (Length Value) encoded bytes + */ + static ColumnPage newVarLengthColumnPage(byte[] lvEncodedBytes, int scale, int precision) + throws MemoryException { + return getLegacyColumnPage(lvEncodedBytes, scale, precision, DataType.BYTE_ARRAY); --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1117#discussion_r126456411 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/SafeFixLengthColumnPage.java --- @@ -99,6 +99,14 @@ public void putBytes(int rowId, byte[] bytes, int offset, int length) { throw new UnsupportedOperationException("invalid data type: " + dataType); } + @Override public void putDecimal(int rowId, BigDecimal decimal) { --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1117 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3005/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1117 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/418/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1117 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/420/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1117 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3008/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/1117 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |