GitHub user xuchuanyin opened a pull request:
https://github.com/apache/carbondata/pull/1195 [CARBONDATA-1281] Support multiple temp dirs for writing files while loading data # Modifications This feature mainly focus on avoiding disk hot-spot in single massive data loading, changes are made in two parts: 1. randomly choose a yarn local folder while writing sort temp file each time in sort-process; 2.randomly choose a yarn local folder while writing carbondata file each time in write-process. # Usage To enable this feature, user should enable `carbon.use.multi.temp.dir=true` and `carbon.use.local.dir=true`. # Performance In my case, this feature improves the loading performance from 35M/s/node to 70+M/s/node You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata feature_mtd4l Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1195.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1195 ---- commit 0d9910896a6c6a696a53f2c905ef23d1870c9b90 Author: xuchuanyin <[hidden email]> Date: 2017-07-25T11:17:53Z Support multiple temp dirs for writing files while loading data randomly choose a dir to write sort temp files randomly choose a dir to write carbondata files Fix errors in spelling optimize default value for using multiple temp dir update document for multiple temp dirs feature update property name (cherry picked from commit 71ab293ef8d2ff24a122bb074b7b95bca8c1b77e) commit 8000041266cb188e8876ae07d61f271993d33459 Author: xuchuanyin <[hidden email]> Date: 2017-07-25T11:20:32Z Add tests for multiple temp dirs during data loading Fix bugs in tests remove header in test data remove useless comment remove added useless testdata update data source for tests (cherry picked from commit ee355b78c0d703d5bc2d2767837c32b6cc422361) commit 92637c6035358b3cc354966d2dc29e1003f387dd Author: xuchuanyin <[hidden email]> Date: 2017-07-25T12:28:17Z resolve review comments + update documents + update parameter name + optimize code to avoid duplicate lines ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user asfgit commented on the issue:
https://github.com/apache/carbondata/pull/1195 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1195 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3188/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit commented on the issue:
https://github.com/apache/carbondata/pull/1195 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1195 This PR is the same as #1177 #1177 contains to many commits, so I use a new PR and squash the commits --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1195 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/593/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1195 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/594/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1195 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3189/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1195 @sraghunandan @chenliang613 @bill1208 Thanks for your reviews. The changes for the reviews are in commit: https://github.com/xuchuanyin/carbondata/commit/92637c6035358b3cc354966d2dc29e1003f387dd --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/1195 @xuchuanyin everything looks ok, please do rebase. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin closed the pull request at:
https://github.com/apache/carbondata/pull/1195 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1195 @chenliang613 Sorry for adding irrelevant commits to this PR by uncorrected rebasing. :disappointed: I've created a new PR #1198 for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1195 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/603/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1195 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3198/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1195 @chenliang613 This PR contains commits from the others by uncorrected rebasing. So I close it and create a new one #1198 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |