GitHub user QiangCai opened a pull request:
https://github.com/apache/incubator-carbondata/pull/594 [WIP]fix memory issue for dataloading You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata nokettleflow Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/594.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #594 ---- commit 5819de59e24a0f9f1e4e44f4ac0af03d20067e55 Author: QiangCai <[hidden email]> Date: 2017-02-08T16:22:03Z fix memory issue for dataloading ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/594 Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/877/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/594 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/879/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/594 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/889/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/594 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/892/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/594#discussion_r100695668 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/row/CarbonRowBatch.java --- @@ -17,27 +17,40 @@ package org.apache.carbondata.processing.newflow.row; -import java.util.ArrayList; import java.util.Iterator; -import java.util.List; /** * Batch of rows. */ -public class CarbonRowBatch { +public class CarbonRowBatch implements Iterator<CarbonRow> { - private List<CarbonRow> rowBatch = new ArrayList<>(); + private CarbonRow[] rowBatch; --- End diff -- Here, Why changes ArrayList to Array? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/594#discussion_r100721258 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -641,7 +641,7 @@ /** * CARBON_PREFETCH_BUFFERSIZE */ - public static final int CARBON_PREFETCH_BUFFERSIZE = 20000; + public static final int CARBON_PREFETCH_BUFFERSIZE = 1000; --- End diff -- Please take this from carbon properties file, and keep default as 1000 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/594#discussion_r100721427 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/impl/ParallelReadMergeSorterImpl.java --- @@ -86,11 +88,10 @@ public void initialize(SortParameters sortParameters) { sortParameters.getNoDictionaryDimnesionColumn(), sortParameters.isUseKettle()); } - @Override - public Iterator<CarbonRowBatch>[] sort(Iterator<CarbonRowBatch>[] iterators) + public void prepare(Iterator<CarbonRowBatch>[] iterators) --- End diff -- Better change name to initialize --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/594#discussion_r100721771 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/IntermediateFileMerger.java --- @@ -116,8 +116,15 @@ public IntermediateFileMerger(SortParameters mergerParameters, File[] intermedia writeDataTofile(next()); } } else { + int i = 0; while (hasNext()) { + i++; writeDataTofileWithOutKettle(next()); + if (i % 10000 == 0) { --- End diff -- it is unnecessary calculation we are adding in it. Already the buffer size is configurable so once buffer size reaches buffered out stream can flush it down. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/594#discussion_r100721876 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java --- @@ -375,6 +376,9 @@ private void writeDataWithOutKettle(Object[][] recordHolderList, int entryCountL stream.write((byte) 0); } } + if (i % 10000 == 9999) { --- End diff -- it is unnecessary calculation we are adding in it. Already the buffer size is configurable so once buffer size reaches buffered out stream can flush it down. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/594 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/900/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/594#discussion_r101432584 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/impl/ParallelReadMergeSorterImpl.java --- @@ -86,11 +88,10 @@ public void initialize(SortParameters sortParameters) { sortParameters.getNoDictionaryDimnesionColumn(), sortParameters.isUseKettle()); } - @Override - public Iterator<CarbonRowBatch>[] sort(Iterator<CarbonRowBatch>[] iterators) + public void prepare(Iterator<CarbonRowBatch>[] iterators) --- End diff -- Yes, but the method initialize is exists. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/594#discussion_r101432953 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/IntermediateFileMerger.java --- @@ -116,8 +116,15 @@ public IntermediateFileMerger(SortParameters mergerParameters, File[] intermedia writeDataTofile(next()); } } else { + int i = 0; while (hasNext()) { + i++; writeDataTofileWithOutKettle(next()); + if (i % 10000 == 0) { --- End diff -- ok, I will test a better value of defualt buffer size. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/594#discussion_r101436963 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java --- @@ -375,6 +376,9 @@ private void writeDataWithOutKettle(Object[][] recordHolderList, int entryCountL stream.write((byte) 0); } } + if (i % 10000 == 9999) { --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/594#discussion_r101436969 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/IntermediateFileMerger.java --- @@ -116,8 +116,15 @@ public IntermediateFileMerger(SortParameters mergerParameters, File[] intermedia writeDataTofile(next()); } } else { + int i = 0; while (hasNext()) { + i++; writeDataTofileWithOutKettle(next()); + if (i % 10000 == 0) { --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/594#discussion_r101436986 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -641,7 +641,7 @@ /** * CARBON_PREFETCH_BUFFERSIZE */ - public static final int CARBON_PREFETCH_BUFFERSIZE = 20000; + public static final int CARBON_PREFETCH_BUFFERSIZE = 1000; --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/594 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/908/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/594#discussion_r101443741 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/Sorter.java --- @@ -39,11 +39,13 @@ * Sorts the data of all iterators, this iterators can be * read parallely depends on implementation. * - * @param iterators array of iterators to read data. * @return * @throws CarbonDataLoadingException */ - Iterator<CarbonRowBatch>[] sort(Iterator<CarbonRowBatch>[] iterators) + Iterator<CarbonRowBatch>[] sort() + throws CarbonDataLoadingException; + + void prepare(Iterator<CarbonRowBatch>[] iterators) --- End diff -- Already previous steps file resources are closed where and when the use of it is over. I don't think this method is required. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/594#discussion_r101460125 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/Sorter.java --- @@ -39,11 +39,13 @@ * Sorts the data of all iterators, this iterators can be * read parallely depends on implementation. * - * @param iterators array of iterators to read data. * @return * @throws CarbonDataLoadingException */ - Iterator<CarbonRowBatch>[] sort(Iterator<CarbonRowBatch>[] iterators) + Iterator<CarbonRowBatch>[] sort() + throws CarbonDataLoadingException; + + void prepare(Iterator<CarbonRowBatch>[] iterators) --- End diff -- Better to invoke child.close() before final merger --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/594#discussion_r101462875 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/Sorter.java --- @@ -39,11 +39,13 @@ * Sorts the data of all iterators, this iterators can be * read parallely depends on implementation. * - * @param iterators array of iterators to read data. * @return * @throws CarbonDataLoadingException */ - Iterator<CarbonRowBatch>[] sort(Iterator<CarbonRowBatch>[] iterators) + Iterator<CarbonRowBatch>[] sort() + throws CarbonDataLoadingException; + + void prepare(Iterator<CarbonRowBatch>[] iterators) --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |