Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] incubator-carbondata pull request #459: [CARBONDATA-558] Fix load performace...

Classic

List

7 messages Options

Options

[GitHub] incubator-carbondata pull request #459: [CARBONDATA-558] Fix load performace...

GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/459

[CARBONDATA-558] Fix load performace when use_kettle=false

Why raise this pr?
When I load a data file, the measure column contains many empty strings, if use_kettle=false, the load performance has a sharp decline
I checked the logs of executor, many warnnings printed like below:
16/12/22 07:03:12 WARN MeasureFieldConverterImpl: pool-22-thread-6 Cant not convert : to Numeric type value. Value considered as null.
How to solve it?
When measureValue = "", we should set it as null directly, no need to do datatype conversion

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata msr_null

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/459.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #459

----
commit eae356ba8e918f60c78c584ad31856c0f10403db
Author: foryou2030 <[hidden email]>
Date: 2016-12-23T08:46:21Z

fix load performace

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #459: [CARBONDATA-558] Fix load performace when u...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/459

Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/301/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #459: [CARBONDATA-558] Fix load performace issue ...

In reply to this post by qiuchenjian-2

Github user lion-x commented on the issue:

https://github.com/apache/incubator-carbondata/pull/459

LGTM

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #459: [CARBONDATA-558] Fix load performace issue ...

In reply to this post by qiuchenjian-2

Github user eason-lyx commented on the issue:

https://github.com/apache/incubator-carbondata/pull/459

@foryou2030 please add test case for this scenario.
thanks.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #459: [CARBONDATA-558] Fix load performace issue ...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on the issue:

https://github.com/apache/incubator-carbondata/pull/459

@foryou2030 As a suggestion whenever you make any changes in the parsing behavior of data in the load flow without kettle, please validate the same behavior for load using the kettle flow considering that as a base. That would give the clear picture for defects.
Also this practice will make clear what all things need to be merged in the new load flow without kettle to keep the same behavior for load with kettle.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #459: [CARBONDATA-558] Fix load performace issue ...

In reply to this post by qiuchenjian-2

Github user foryou2030 commented on the issue:

https://github.com/apache/incubator-carbondata/pull/459

@manishgupta88 , yes , I compared the code of new flow with the kettle flow, and found the difference. We can check the method populateOutputRow in class "CarbonCSVBasedSeqGenStep" for more details.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #459: [CARBONDATA-558] Fix load performace...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/459

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---