GitHub user foryou2030 opened a pull request:
https://github.com/apache/incubator-carbondata/pull/459 [CARBONDATA-558] Fix load performace when use_kettle=false Why raise this pr? When I load a data file, the measure column contains many empty strings, if use_kettle=false, the load performance has a sharp decline I checked the logs of executor, many warnnings printed like below: 16/12/22 07:03:12 WARN MeasureFieldConverterImpl: pool-22-thread-6 Cant not convert : to Numeric type value. Value considered as null. How to solve it? When measureValue = "", we should set it as null directly, no need to do datatype conversion You can merge this pull request into a Git repository by running: $ git pull https://github.com/foryou2030/incubator-carbondata msr_null Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/459.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #459 ---- commit eae356ba8e918f60c78c584ad31856c0f10403db Author: foryou2030 <[hidden email]> Date: 2016-12-23T08:46:21Z fix load performace ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/459 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/301/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user lion-x commented on the issue:
https://github.com/apache/incubator-carbondata/pull/459 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user eason-lyx commented on the issue:
https://github.com/apache/incubator-carbondata/pull/459 @foryou2030 please add test case for this scenario. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/incubator-carbondata/pull/459 @foryou2030 As a suggestion whenever you make any changes in the parsing behavior of data in the load flow without kettle, please validate the same behavior for load using the kettle flow considering that as a base. That would give the clear picture for defects. Also this practice will make clear what all things need to be merged in the new load flow without kettle to keep the same behavior for load with kettle. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user foryou2030 commented on the issue:
https://github.com/apache/incubator-carbondata/pull/459 @manishgupta88 , yes , I compared the code of new flow with the kettle flow, and found the difference. We can check the method populateOutputRow in class "CarbonCSVBasedSeqGenStep" for more details. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/459 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |