GitHub user ravipesala opened a pull request:
https://github.com/apache/incubator-carbondata/pull/369 [CARBONDATA-470][WIP]Add unsafe offheap and on-heap sort in carbodata loading In the current carbondata system loading performance is not so encouraging since we need to sort the data at executor level for data loading. Carbondata collects batch of data and sorts before dumping to the temporary files and finally it does merge sort from those temporary files to finish sorting. Here we face two major issues , one is disk IO and second is GC issue. Even though we dump to the file still carbondata face lot of GC issue since we sort batch data in-memory before dumping to the temporary files. To solve the above problems we can introduce Unsafe Storage and Unsafe sort. Unsafe Storage : User can configure the memory limit to keep the amount of data to in-memory. Here we can keep all the data in continuous memory location either on off-heap or on-heap using Unsafe. Once configure limit exceeds remaining data will be spilled to disk. Unsafe Sort : The data which is store in-memory using Unsafe can be sorted using Unsafe sort. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata unsafesortnew Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/369.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #369 ---- commit d223681c799d373beb30748166d3f181ed86981a Author: ravipesala <[hidden email]> Date: 2016-11-27T12:09:36Z Optimize data loading commit f21dc18a304efe171ebb64a2e7135534b4dd09fd Author: ravipesala <[hidden email]> Date: 2016-11-28T11:42:01Z Unsafe Sort commit b0b93560776944f51e6aa6fe5d4a0ed326f21834 Author: ravipesala <[hidden email]> Date: 2016-11-28T11:58:05Z disabled memory merge commit a4ab3abc07396a42526bcfe8cd5e9b1714df56c9 Author: ravipesala <[hidden email]> Date: 2016-11-28T12:00:58Z disabled memory merge commit 95eee6288d938efdf60923e277dbae13b2645021 Author: ravipesala <[hidden email]> Date: 2016-11-29T03:45:08Z refactored commit 771d00d18fadf421f8dec9ad266185cae06af402 Author: ravipesala <[hidden email]> Date: 2016-11-29T19:31:06Z Fixed merging issues. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user allwefantasy commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 Does this PR have considered allocating memory from TaskMemoryManagerï¼ Many Spark application runs on Yarn,if you use off-heap,it's easy to trigger behavior of yarn's killing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 @allwefantasy Thanks for your suggestion. Yes, we can use the Spark's memory manager instead of our own. I will open the interface and provide implementation to use TaskMemoryManager. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 Build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 Build Failed, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/42/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 Build Failed, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/51/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 Build Failed, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/52/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 Build Success, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/54/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 Build Success, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/55/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 @ravipesala @allwefantasy I have fired a JIRA ticket for the TaskMemoryManager integration, we can do it in another PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/66/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 Since this PR ported some code (Memory management) from Apache Spark, please add description in the file header to mention it is porting from it. In fact, Spark also uses some code from other project, please refer to [this](https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/TimSort.java) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 @jackylk Added file headers --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/369 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/151/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/369 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |