GitHub user manishgupta88 opened a pull request:
https://github.com/apache/carbondata/pull/1297 [WIP] Added a value based compression for decimal data type when decimal is stored as Int or Long Added a value based compression for decimal data type when decimal is stored as Int or Long 1. When decimal precision is <= 9, decimal values are stored in 4 bytes but they are not compressed further based on min and max values as compared with other primitive data type compression. Therefore now based on min and max value decimal data falling in Integer range will be further compressed as byte or short. 2. When decimal precision is <= 18, decimal values are stored in 8 bytes but they are not compressed further based on min and max values as compared with other primitive data type compression. Therefore now based on min and max value decimal data falling in Long range will be further compressed as byte, short or int. Advantage: This will reduce the storage space thereby decreasing the IO time while decompressing the data. TODO: Need to be implemented for unsafe type You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/carbondata decimal_compression Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1297.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1297 ---- commit 04d47802fa139f9a359ca6727f4f79a26402b43a Author: manishgupta88 <[hidden email]> Date: 2017-08-24T07:13:58Z Added a value based compression for decimal data type. 1. When decimal precision is <= 9, decimal values are stored in 4 bytes but they are not compressed further based on min and max values as compared with other primitive data type compression. Therefore now based on min and max value decimal data falling in Integer range will be further compressed as byte or short. 2. When decimal precision is <= 18, decimal values are stored in 8 bytes but they are not compressed further based on min and max values as compared with other primitive data type compression. Therefore now based on min and max value decimal data falling in Long range will be further compressed as byte, short or int. Advantage: This will reduce the storage space thereby decreasing the IO time while decompressing the data. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1297 SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/304/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1297 SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/308/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1297 SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/328/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1297 SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/330/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/1297 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1297 SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/340/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1297 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3431/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/1297 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1297#discussion_r136248243 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/SafeFixLengthColumnPage.java --- @@ -165,9 +165,28 @@ public double getDouble(int rowId) { return doubleData[rowId]; } - @Override - public BigDecimal getDecimal(int rowId) { - throw new UnsupportedOperationException("invalid data type: " + dataType); + @Override public BigDecimal getDecimal(int rowId) { --- End diff -- I think it is better to add a DecimalColumnPage instead of using FixLengthColumnPage --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1297 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3432/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1297#discussion_r136260340 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/SafeFixLengthColumnPage.java --- @@ -165,9 +165,28 @@ public double getDouble(int rowId) { return doubleData[rowId]; } - @Override - public BigDecimal getDecimal(int rowId) { - throw new UnsupportedOperationException("invalid data type: " + dataType); + @Override public BigDecimal getDecimal(int rowId) { --- End diff -- Please correct me If I am wrong....From my opinion it will not be any advantage to add DecimalColumnPage as we are not doing anything extra if we introduce DecimalColumnPage. Basically we are storing values either as fixed or variable length and decimal values also falls in these 2 categories. So according to current design I believe it may not be required to introduce 1 more type of page until and unless we have pages for all other datatypes separately. Please share your suggestions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1297 SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/348/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ashwini-krishnakumar commented on the issue:
https://github.com/apache/carbondata/pull/1297 @ravipesala please review --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1297#discussion_r137270300 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingStrategy.java --- @@ -104,18 +107,31 @@ private ColumnPageEncoder createEncoderForMeasure(ColumnPage columnPage) { case SHORT: case INT: case LONG: - return selectCodecByAlgorithmForIntegral(stats).createEncoder(null); + return selectCodecByAlgorithmForIntegral(stats, + DecimalConverterFactory.DecimalConverterType.DECIMAL_LONG).createEncoder(null); + case DECIMAL: + return createEncoderForDecimalDataTypeMeasure(columnPage); --- End diff -- Rename as others, `selectCodecByAlgorithmForDecimal`. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1297#discussion_r137270799 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingStrategy.java --- @@ -130,7 +146,8 @@ private static DataType fitLongMinMax(long max, long min) { } } - private static DataType fitMinMax(DataType dataType, Object max, Object min) { + private static DataType fitMinMax(DataType dataType, Object max, Object min, + DecimalConverterFactory.DecimalConverterType decimalConverterType) { --- End diff -- I think it is not good to pass `decimalConverterType` in many functions. It makes code complex. It is better to think of a way to encapsulate it. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1297#discussion_r137273402 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/SafeFixLengthColumnPage.java --- @@ -165,9 +165,28 @@ public double getDouble(int rowId) { return doubleData[rowId]; } - @Override - public BigDecimal getDecimal(int rowId) { - throw new UnsupportedOperationException("invalid data type: " + dataType); + @Override public BigDecimal getDecimal(int rowId) { --- End diff -- I think adding converter inside the column page make it a bit complex. I think there are two mays to make the code clean: 1. Store BigDecimal array in the ColumnPage and do the conversion to int or long when come to encoding part, by implementing a ColumnPageValueConverter 2. Create DecimalColumnPage class and add the conversion logic in DecimalColumnPage. This logic is the extra logic for decimal only. If you do like this, we can consider to rename `FixLengthColumnPage` to `PrimitiveColumnPage` and `VarLengthColumnPage` to `StringColumnPage` and the 3rd one is `DecimalColumnPage`. The responsibility of each class will be more clear. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1297 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/828/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1297 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/16/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1297 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/113/ --- |
Free forum by Nabble | Edit this page |