Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] incubator-carbondata pull request #594: [WIP]fix memory issue for dataloadin...

Classic

List

24 messages Options

Options

12

[GitHub] incubator-carbondata pull request #594: [WIP]fix memory issue for dataloadin...

GitHub user QiangCai opened a pull request:

https://github.com/apache/incubator-carbondata/pull/594

[WIP]fix memory issue for dataloading

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/QiangCai/incubator-carbondata nokettleflow

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/594.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #594

----
commit 5819de59e24a0f9f1e4e44f4ac0af03d20067e55
Author: QiangCai <[hidden email]>
Date: 2017-02-08T16:22:03Z

fix memory issue for dataloading

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #594: [WIP]fix memory issue for dataloading

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/594

Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/877/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #594: [WIP]fix memory issue for dataloading

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/594

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/879/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #594: [WIP]fix memory issue for dataloading

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/594

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/889/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #594: [CARBONDATA-701]Fix memory leak issue for n...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/594

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/892/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/594#discussion_r100695668

--- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/row/CarbonRowBatch.java ---
@@ -17,27 +17,40 @@

package org.apache.carbondata.processing.newflow.row;

-import java.util.ArrayList;
import java.util.Iterator;
-import java.util.List;

/**
* Batch of rows.
*/
-public class CarbonRowBatch {
+public class CarbonRowBatch implements Iterator<CarbonRow> {

- private List<CarbonRow> rowBatch = new ArrayList<>();
+ private CarbonRow[] rowBatch;
--- End diff --

Here, Why changes ArrayList to Array?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/594#discussion_r100721258

--- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
@@ -641,7 +641,7 @@
/**
* CARBON_PREFETCH_BUFFERSIZE
*/
- public static final int CARBON_PREFETCH_BUFFERSIZE = 20000;
+ public static final int CARBON_PREFETCH_BUFFERSIZE = 1000;
--- End diff --

Please take this from carbon properties file, and keep default as 1000

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/594#discussion_r100721427

--- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/impl/ParallelReadMergeSorterImpl.java ---
@@ -86,11 +88,10 @@ public void initialize(SortParameters sortParameters) {
sortParameters.getNoDictionaryDimnesionColumn(), sortParameters.isUseKettle());
}

- @Override
- public Iterator<CarbonRowBatch>[] sort(Iterator<CarbonRowBatch>[] iterators)
+ public void prepare(Iterator<CarbonRowBatch>[] iterators)
--- End diff --

Better change name to initialize

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/594#discussion_r100721771

--- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/IntermediateFileMerger.java ---
@@ -116,8 +116,15 @@ public IntermediateFileMerger(SortParameters mergerParameters, File[] intermedia
writeDataTofile(next());
}
} else {
+ int i = 0;
while (hasNext()) {
+ i++;
writeDataTofileWithOutKettle(next());
+ if (i % 10000 == 0) {
--- End diff --

it is unnecessary calculation we are adding in it. Already the buffer size is configurable so once buffer size reaches buffered out stream can flush it down.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/594#discussion_r100721876

--- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java ---
@@ -375,6 +376,9 @@ private void writeDataWithOutKettle(Object[][] recordHolderList, int entryCountL
stream.write((byte) 0);
}
}
+ if (i % 10000 == 9999) {
--- End diff --

it is unnecessary calculation we are adding in it. Already the buffer size is configurable so once buffer size reaches buffered out stream can flush it down.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #594: [CARBONDATA-701]Fix memory leak issue for n...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/594

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/900/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/594#discussion_r101432584

--- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/impl/ParallelReadMergeSorterImpl.java ---
@@ -86,11 +88,10 @@ public void initialize(SortParameters sortParameters) {
sortParameters.getNoDictionaryDimnesionColumn(), sortParameters.isUseKettle());
}

- @Override
- public Iterator<CarbonRowBatch>[] sort(Iterator<CarbonRowBatch>[] iterators)
+ public void prepare(Iterator<CarbonRowBatch>[] iterators)
--- End diff --

Yes, but the method initialize is exists.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/594#discussion_r101432953

--- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/IntermediateFileMerger.java ---
@@ -116,8 +116,15 @@ public IntermediateFileMerger(SortParameters mergerParameters, File[] intermedia
writeDataTofile(next());
}
} else {
+ int i = 0;
while (hasNext()) {
+ i++;
writeDataTofileWithOutKettle(next());
+ if (i % 10000 == 0) {
--- End diff --

ok, I will test a better value of defualt buffer size.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/594#discussion_r101436963

--- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java ---
@@ -375,6 +376,9 @@ private void writeDataWithOutKettle(Object[][] recordHolderList, int entryCountL
stream.write((byte) 0);
}
}
+ if (i % 10000 == 9999) {
--- End diff --

fixed

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/594#discussion_r101436969

--- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/IntermediateFileMerger.java ---
@@ -116,8 +116,15 @@ public IntermediateFileMerger(SortParameters mergerParameters, File[] intermedia
writeDataTofile(next());
}
} else {
+ int i = 0;
while (hasNext()) {
+ i++;
writeDataTofileWithOutKettle(next());
+ if (i % 10000 == 0) {
--- End diff --

fixed

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/594#discussion_r101436986

--- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
@@ -641,7 +641,7 @@
/**
* CARBON_PREFETCH_BUFFERSIZE
*/
- public static final int CARBON_PREFETCH_BUFFERSIZE = 20000;
+ public static final int CARBON_PREFETCH_BUFFERSIZE = 1000;
--- End diff --

fixed

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #594: [CARBONDATA-701]Fix memory leak issue for n...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/594

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/908/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/594#discussion_r101443741

--- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/Sorter.java ---
@@ -39,11 +39,13 @@
* Sorts the data of all iterators, this iterators can be
* read parallely depends on implementation.
*
- * @param iterators array of iterators to read data.
* @return
* @throws CarbonDataLoadingException
*/
- Iterator<CarbonRowBatch>[] sort(Iterator<CarbonRowBatch>[] iterators)
+ Iterator<CarbonRowBatch>[] sort()
+ throws CarbonDataLoadingException;
+
+ void prepare(Iterator<CarbonRowBatch>[] iterators)
--- End diff --

Already previous steps file resources are closed where and when the use of it is over. I don't think this method is required.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/594#discussion_r101460125

--- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/Sorter.java ---
@@ -39,11 +39,13 @@
* Sorts the data of all iterators, this iterators can be
* read parallely depends on implementation.
*
- * @param iterators array of iterators to read data.
* @return
* @throws CarbonDataLoadingException
*/
- Iterator<CarbonRowBatch>[] sort(Iterator<CarbonRowBatch>[] iterators)
+ Iterator<CarbonRowBatch>[] sort()
+ throws CarbonDataLoadingException;
+
+ void prepare(Iterator<CarbonRowBatch>[] iterators)
--- End diff --

Better to invoke child.close() before final merger

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/594#discussion_r101462875

--- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/Sorter.java ---
@@ -39,11 +39,13 @@
* Sorts the data of all iterators, this iterators can be
* read parallely depends on implementation.
*
- * @param iterators array of iterators to read data.
* @return
* @throws CarbonDataLoadingException
*/
- Iterator<CarbonRowBatch>[] sort(Iterator<CarbonRowBatch>[] iterators)
+ Iterator<CarbonRowBatch>[] sort()
+ throws CarbonDataLoadingException;
+
+ void prepare(Iterator<CarbonRowBatch>[] iterators)
--- End diff --

fixed

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

12