GitHub user ravipesala opened a pull request:
https://github.com/apache/incubator-carbondata/pull/644 [CARBONDATA-757]Big decimal optimization Currently Decimal is converted to bytes and using LV (length + value) format to write to store. And while getting back read the bytes in LV format and convert back the bigdecimal. We can do following operations to improve storage and processing. 1. if decimal precision is less than 9 then we can fit in int (4 bytes) 2. if decimal precision is less than 18 then we can fit in long (8 bytes) 3. if decimal precision is more than 18 then we can fit in fixed length bytes(the length bytes can vary depends on precision but it is always fixed length) So in this approach we no need store bigdecimal in LV format, we can store in fixed format.It reduces the memory. Carbondata format changes -> Added fixedLength in datachunk to know about the column length of big decimal. This attribute can be used in case of char(fixedlength) or varchar(fixedlength) datatypes as well. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata bigdecimal-optimize Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/644.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #644 ---- commit 241c032f3e54facb59ba0b946f3c0c0c67dab59c Author: ravipesala <[hidden email]> Date: 2017-03-09T12:42:47Z BigDecimal optimization ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user ravipesala commented on the issue:
https://github.com/apache/incubator-carbondata/pull/644 Build will not compile as there are carbon-format changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/644 Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1082/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/incubator-carbondata/pull/644 Test results witj 100 million data **DDL** CREATE TABLE perftesta (c1 string,c2 string,c3 string,c4 string,c5 string,c6 bigint,c7 decimal(7,2),c8 int,c9 decimal(7,2),c10 decimal(15,2)) STORED BY 'carbondata' **Queries** Q1 -> SELECT count(c1),count(c2),count(c3),count(c4),count(c5),count(c6),count(c7),count(c8),count(c9),count(c10) FROM perftesta99; Q2 -> SELECT sum(c7), sum(c8), sum(9), sum(c10) FROM perftesta99 WHERE c2="P2_75" and c7<5; Q3 -> SELECT c2, c5, count(distinct c1), sum(c7) FROM perftesta99 WHERE c4="P4_4" and c5="P5_7" GROUP BY c2, c5; **Master Code** Load time -> 576 seconds Data size after load -> 1800MB Query(first_reading, second_reading) Q1(25.27, 21.794) Q2(27.296, 28.21) Q3(7.383, 5.103) **This PR Code** Load time -> 431 seconds Data size after load -> 1720MB Query(first_reading, second_reading) Q1(18.507,14.427) Q2(24.102, 23.322) Q3(6.87,5.079) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |