[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/1198

    [CARBONDATA-1281] Support multiple temp dirs for writing files while loading data

    # Modifications
    This feature mainly focus on avoiding disk hot-spot in single massive data loading, changes are made in two parts:
   
    1. randomly choose a yarn local folder while writing sort temp file each time in sort-process;
   
    2.randomly choose a yarn local folder while writing carbondata file each time in write-process.
   
    # Usage
   
    To enable this feature, user should enable `carbon.use.multi.temp.dir=true` and `carbon.use.local.dir=true`.
   
    # Performance
    In my case, this feature improves the loading performance from 35M/s/node to 70+M/s/node


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata new_feature_mtd4l

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1198.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1198
   
----
commit 46da65a1a0579c62a7f4196ae622f83dd5197e3a
Author: xuchuanyin <[hidden email]>
Date:   2017-07-25T11:17:53Z

    Support multiple temp dirs for writing files while loading data
   
    randomly choose a dir to write sort temp files
   
    randomly choose a dir to write carbondata files
   
    Fix errors in spelling
   
    optimize default value for using multiple temp dir
   
    update document for multiple temp dirs feature
   
    update property name
   
    (cherry picked from commit 71ab293ef8d2ff24a122bb074b7b95bca8c1b77e)

commit 6e35dec70196a12aaac24a69c795d3597f946386
Author: xuchuanyin <[hidden email]>
Date:   2017-07-25T11:20:32Z

    Add tests for multiple temp dirs during data loading
   
    Fix bugs in tests
   
    remove header in test data
   
    remove useless comment
   
    remove added useless testdata
   
    update data source for tests
   
    (cherry picked from commit ee355b78c0d703d5bc2d2767837c32b6cc422361)

commit 3e633070c3f793867c03ba350048994ced0e5527
Author: xuchuanyin <[hidden email]>
Date:   2017-07-25T12:28:17Z

    resolve review comments
   
    + update documents
    + update parameter name
    + optimize code to avoid duplicate lines

commit 9f746178600d7c16267bd0276b8a492f69871802
Author: xuchuanyin <[hidden email]>
Date:   2017-07-25T12:42:35Z

    fix checkstyle error

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    I created this PR and closed #1195


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/604/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3199/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin closed the pull request at:

    https://github.com/apache/carbondata/pull/1198


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
GitHub user xuchuanyin reopened a pull request:

    https://github.com/apache/carbondata/pull/1198

    [CARBONDATA-1281] Support multiple temp dirs for writing files while loading data

    # Modifications
    This feature mainly focus on avoiding disk hot-spot in single massive data loading, changes are made in two parts:
   
    1. randomly choose a yarn local folder while writing sort temp file each time in sort-process;
   
    2.randomly choose a yarn local folder while writing carbondata file each time in write-process.
   
    # Usage
   
    To enable this feature, user should enable `carbon.use.multi.temp.dir=true` and `carbon.use.local.dir=true`.
   
    # Performance
    In my case, this feature improves the loading performance from 35M/s/node to 70+M/s/node


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata new_feature_mtd4l

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1198.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1198
   
----
commit 46da65a1a0579c62a7f4196ae622f83dd5197e3a
Author: xuchuanyin <[hidden email]>
Date:   2017-07-25T11:17:53Z

    Support multiple temp dirs for writing files while loading data
   
    randomly choose a dir to write sort temp files
   
    randomly choose a dir to write carbondata files
   
    Fix errors in spelling
   
    optimize default value for using multiple temp dir
   
    update document for multiple temp dirs feature
   
    update property name
   
    (cherry picked from commit 71ab293ef8d2ff24a122bb074b7b95bca8c1b77e)

commit 6e35dec70196a12aaac24a69c795d3597f946386
Author: xuchuanyin <[hidden email]>
Date:   2017-07-25T11:20:32Z

    Add tests for multiple temp dirs during data loading
   
    Fix bugs in tests
   
    remove header in test data
   
    remove useless comment
   
    remove added useless testdata
   
    update data source for tests
   
    (cherry picked from commit ee355b78c0d703d5bc2d2767837c32b6cc422361)

commit 3e633070c3f793867c03ba350048994ced0e5527
Author: xuchuanyin <[hidden email]>
Date:   2017-07-25T12:28:17Z

    resolve review comments
   
    + update documents
    + update parameter name
    + optimize code to avoid duplicate lines

commit 9f746178600d7c16267bd0276b8a492f69871802
Author: xuchuanyin <[hidden email]>
Date:   2017-07-25T12:42:35Z

    fix checkstyle error

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    There is no useful information in the compilation message.
   
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3200/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/605/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3201/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/606/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/608/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3203/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3205/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1198
 
    Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/610/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
12