ajantha-bhat opened a new pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682 ### Why is this PR needed? For every double/float column's value. we callĀ `PrimitivePageStatsCollector.getDecimalCount(double value)` problem is, here we create new bigdecimal object andĀ plain string object every time. Which leads in huge memory usage during insert. ### What changes were proposed in this PR? Create only Bigdecimal object and use scale from that. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No Before the change: ![Screenshot from 2020-03-26 16-45-12](https://user-images.githubusercontent.com/5889404/77640947-380c0e80-6f81-11ea-97ff-f1b8942d99c6.png) After the change: ![Screenshot from 2020-03-26 16-30-27](https://user-images.githubusercontent.com/5889404/77640863-16128c00-6f81-11ea-8af6-1b60cc7a4ab8.png) There is about 5% improvement in insert for the TPCH lineitem table with10GB data without any change in store size. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604426644 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/859/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604438852 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2567/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604464981 @jackylk , @ravipesala : please check ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
brijoobopanna commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604935581 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604992269 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/864/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604995090 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2572/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
kunal642 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-610756333 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
QiangCai commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#discussion_r406015043 ########## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/PrimitivePageStatsCollector.java ########## @@ -233,20 +233,18 @@ public void update(long value) { /** * Return number of digit after decimal point - * TODO: it operation is costly, optimize for performance */ private int getDecimalCount(double value) { int decimalPlaces = 0; try { - String strValue = BigDecimal.valueOf(Math.abs(value)).toPlainString(); - int integerPlaces = strValue.indexOf('.'); - if (-1 != integerPlaces) { - decimalPlaces = strValue.length() - integerPlaces - 1; + BigDecimal decimalValue = BigDecimal.valueOf(value); Review comment: better to write code to implement it. not required to use BigDecimal. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#discussion_r406024143 ########## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/PrimitivePageStatsCollector.java ########## @@ -233,20 +233,18 @@ public void update(long value) { /** * Return number of digit after decimal point - * TODO: it operation is costly, optimize for performance */ private int getDecimalCount(double value) { int decimalPlaces = 0; try { - String strValue = BigDecimal.valueOf(Math.abs(value)).toPlainString(); - int integerPlaces = strValue.indexOf('.'); - if (-1 != integerPlaces) { - decimalPlaces = strValue.length() - integerPlaces - 1; + BigDecimal decimalValue = BigDecimal.valueOf(value); Review comment: Actually double will not always be like `xx.yyy`, it will be having `exponent` also. So, Bigdecimal already converts to string and do that logic. May be next version we can reduce further by removing big decimal and copying API inside big decimal to do that. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#discussion_r406045145 ########## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/PrimitivePageStatsCollector.java ########## @@ -233,20 +233,18 @@ public void update(long value) { /** * Return number of digit after decimal point - * TODO: it operation is costly, optimize for performance */ private int getDecimalCount(double value) { int decimalPlaces = 0; try { - String strValue = BigDecimal.valueOf(Math.abs(value)).toPlainString(); - int integerPlaces = strValue.indexOf('.'); - if (-1 != integerPlaces) { - decimalPlaces = strValue.length() - integerPlaces - 1; + BigDecimal decimalValue = BigDecimal.valueOf(value); Review comment: Also we need something without converting to string ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
QiangCai commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-611421173 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
asfgit closed pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |