[GitHub] [carbondata] ajantha-bhat opened a new pull request #3682: [CARBONDATA-3753] optimize double/float stats collector

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3682: [CARBONDATA-3753] optimize double/float stats collector

GitBox
ajantha-bhat opened a new pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682
 
 
    ### Why is this PR needed?
   For every double/float column's value. we callĀ 
   `PrimitivePageStatsCollector.getDecimalCount(double value)`
   problem is, here we create new bigdecimal object andĀ plain string object every time.
   Which leads in huge memory usage during insert.
   
    ### What changes were proposed in this PR?
   Create only Bigdecimal object and use scale from that.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No
   
   Before the change:
   ![Screenshot from 2020-03-26 16-45-12](https://user-images.githubusercontent.com/5889404/77640947-380c0e80-6f81-11ea-97ff-f1b8942d99c6.png)
   
   
   After the change:
   ![Screenshot from 2020-03-26 16-30-27](https://user-images.githubusercontent.com/5889404/77640863-16128c00-6f81-11ea-8af6-1b60cc7a4ab8.png)
   
   There is about 5% improvement in insert for the TPCH lineitem table with10GB data without any change in store size.
   
     
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

GitBox
CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604426644
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/859/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604438852
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2567/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604464981
 
 
   @jackylk , @ravipesala : please check

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] brijoobopanna commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

GitBox
In reply to this post by GitBox
brijoobopanna commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604935581
 
 
   retest this please
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604992269
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/864/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604995090
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2572/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

GitBox
In reply to this post by GitBox
kunal642 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-610756333
 
 
   LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector

GitBox
In reply to this post by GitBox
QiangCai commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#discussion_r406015043
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/PrimitivePageStatsCollector.java
 ##########
 @@ -233,20 +233,18 @@ public void update(long value) {
 
   /**
    * Return number of digit after decimal point
-   * TODO: it operation is costly, optimize for performance
    */
   private int getDecimalCount(double value) {
     int decimalPlaces = 0;
     try {
-      String strValue = BigDecimal.valueOf(Math.abs(value)).toPlainString();
-      int integerPlaces = strValue.indexOf('.');
-      if (-1 != integerPlaces) {
-        decimalPlaces = strValue.length() - integerPlaces - 1;
+      BigDecimal decimalValue = BigDecimal.valueOf(value);
 
 Review comment:
   better to write code to implement it.
   not required to use BigDecimal.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#discussion_r406024143
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/PrimitivePageStatsCollector.java
 ##########
 @@ -233,20 +233,18 @@ public void update(long value) {
 
   /**
    * Return number of digit after decimal point
-   * TODO: it operation is costly, optimize for performance
    */
   private int getDecimalCount(double value) {
     int decimalPlaces = 0;
     try {
-      String strValue = BigDecimal.valueOf(Math.abs(value)).toPlainString();
-      int integerPlaces = strValue.indexOf('.');
-      if (-1 != integerPlaces) {
-        decimalPlaces = strValue.length() - integerPlaces - 1;
+      BigDecimal decimalValue = BigDecimal.valueOf(value);
 
 Review comment:
   Actually double will not always be like `xx.yyy`, it will be having `exponent` also.
   So, Bigdecimal already converts to string and do that logic.
   
   May be next version we can reduce further by removing big decimal and copying API inside big decimal to do that.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#discussion_r406045145
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/PrimitivePageStatsCollector.java
 ##########
 @@ -233,20 +233,18 @@ public void update(long value) {
 
   /**
    * Return number of digit after decimal point
-   * TODO: it operation is costly, optimize for performance
    */
   private int getDecimalCount(double value) {
     int decimalPlaces = 0;
     try {
-      String strValue = BigDecimal.valueOf(Math.abs(value)).toPlainString();
-      int integerPlaces = strValue.indexOf('.');
-      if (-1 != integerPlaces) {
-        decimalPlaces = strValue.length() - integerPlaces - 1;
+      BigDecimal decimalValue = BigDecimal.valueOf(value);
 
 Review comment:
   Also we need something without converting to string

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

GitBox
In reply to this post by GitBox
QiangCai commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-611421173
 
 
   LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3682: [CARBONDATA-3753] optimize double/float stats collector

GitBox
In reply to this post by GitBox
asfgit closed pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services