Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1008: [CARBONDATA-1145] Fix single-pass issue for m...

Classic

List

42 messages Options

Options

123

[GitHub] carbondata pull request #1008: [CARBONDATA-1145] Fix single-pass issue for m...

GitHub user QiangCai opened a pull request:

https://github.com/apache/carbondata/pull/1008

[CARBONDATA-1145] Fix single-pass issue for multi-task loading

**Issue**
The single-pass loading of partition table lost incremental dictionary. The query result of dictionary column is null.

**Analyze**
During the single-pass loading, the executor will send many initial message to server to initialize the dictionary generator. for each initial message, it will update the generator, will lead to lost some incremental dictionary.

**Solution**
Reuse the dictionary generator, no need to update it if exists.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/QiangCai/carbondata fixsinglepassforpatition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1008.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1008

----
commit 8b3f897878207fef56bc7445f925ebcc5876c037
Author: QiangCai <[hidden email]>
Date: 2017-06-08T09:47:07Z

fix single-pass issue for partition table

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1008

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2291/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1008

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/164/<h2>Failed Tests: <span class='status-failure'>1</span></h2><h3><a name='carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-core' /><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/164/org.apache.carbondata$carbondata-core/testReport'>carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-core</a>: <span class='status-failure'>1</span></h3><ul><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/164/org.apache.carbondata$carbondata-core/testReport/org.apache.carbondata.core.dictionary.generator/TableDictionaryGeneratorTest/updateGenerator/'><strong>org.apache.carbondata.core.dictionary.generator.TableDictionaryGeneratorTest.updateGenerator</strong></a></li></ul>

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1008

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/358/<h2>Failed Tests: <span class='status-failure'>0</span></h2>

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1008

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2473/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1008

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2483/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1008

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/368/<h2>Failed Tests: <span class='status-failure'>1</span></h2><h3><a name='carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test' /><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/368/org.apache.carbondata$carbondata-spark-common-test/testReport'>carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test</a>: <span class='status-failure'>1</span></h3><ul><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/368/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.spark.testsuite.bigdecimal/TestDimensionWithDecimalDataType/test_unsafe_with_bigdecimal/'><strong>org.apache.carbondata.spark.testsuite.bigdecimal.TestDimensionWithDecimalDataType.test unsafe with bigdecimal</strong></a></li></ul>

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/1008

retest this please

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1008

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2484/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1008

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/369/<h2>Failed Tests: <span class='status-failure'>1</span></h2><h3><a name='carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test' /><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/369/org.apache.carbondata$carbondata-spark-common-test/testReport'>carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test</a>: <span class='status-failure'>1</span></h3><ul><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/369/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.spark.testsuite.bigdecimal/TestDimensionWithDecimalDataType/test_unsafe_with_bigdecimal/'><strong>org.apache.carbondata.spark.testsuite.bigdecimal.TestDimensionWithDecimalDataType.test unsafe with bigdecimal</strong></a></li></ul>

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/1008

retest this please

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1008

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2485/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1008

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/370/<h2>Failed Tests: <span class='status-failure'>1</span></h2><h3><a name='carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test' /><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/370/org.apache.carbondata$carbondata-spark-common-test/testReport'>carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test</a>: <span class='status-failure'>1</span></h3><ul><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/370/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.spark.testsuite.bigdecimal/TestDimensionWithDecimalDataType/test_unsafe_with_bigdecimal/'><strong>org.apache.carbondata.spark.testsuite.bigdecimal.TestDimensionWithDecimalDataType.test unsafe with bigdecimal</strong></a></li></ul>

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1008: [CARBONDATA-1145] Fix single-pass issue for m...

In reply to this post by qiuchenjian-2

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1008#discussion_r121982426

--- Diff: core/src/main/java/org/apache/carbondata/core/dictionary/generator/IncrementalColumnDictionaryGenerator.java ---
@@ -169,10 +172,22 @@ public IncrementalColumnDictionaryGenerator(CarbonDimension dimension, int maxVa
}
// write value to dictionary file
if (reverseIncrementalCache.size() > 0) {
- for (int index = 2; index < reverseIncrementalCache.size() + 2; index++) {
- String value = reverseIncrementalCache.get(index);
+ String[] values = null;
+ synchronized (lock) {
+ // collect incremental dictionary
+ values = new String[currentDictionarySize - maxValue];
--- End diff --

Why it is collected to temporary list, why not directly write inside lock?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1008: [CARBONDATA-1145] Fix single-pass issue for m...

In reply to this post by qiuchenjian-2

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1008#discussion_r121982890

--- Diff: core/src/main/java/org/apache/carbondata/core/dictionary/generator/TableDictionaryGenerator.java ---
@@ -115,7 +115,10 @@ public Integer size(DictionaryMessage key) {
}

public void updateGenerator(CarbonDimension dimension) {
- columnMap.put(dimension.getColumnId(),
- new IncrementalColumnDictionaryGenerator(dimension, 1));
+ // reuse dictionary generator
+ if (null == columnMap.get(dimension.getColumnId())) {
+ columnMap.put(dimension.getColumnId(),
--- End diff --

lock is required to be taken on columnMap. with double checking, otherwise still parallel intialization is possible.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1008: [CARBONDATA-1145] Fix single-pass issue for m...

In reply to this post by qiuchenjian-2

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1008#discussion_r121983478

--- Diff: core/src/main/java/org/apache/carbondata/core/dictionary/generator/TableDictionaryGenerator.java ---
@@ -115,7 +115,10 @@ public Integer size(DictionaryMessage key) {
}

public void updateGenerator(CarbonDimension dimension) {
- columnMap.put(dimension.getColumnId(),
- new IncrementalColumnDictionaryGenerator(dimension, 1));
+ // reuse dictionary generator
+ if (null == columnMap.get(dimension.getColumnId())) {
+ columnMap.put(dimension.getColumnId(),
--- End diff --

ServerDictionaryGenerator.java
initializeGeneratorForTable() -> if (tableMap.get(key.getTableUniqueName()) == null) also required lock on tableMap during double checking.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1008

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2488/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1008

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/373/<h2>Failed Tests: <span class='status-failure'>2</span></h2><h3><a name='carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test' /><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/373/org.apache.carbondata$carbondata-spark-common-test/testReport'>carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test</a>: <span class='status-failure'>2</span></h3><ul><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/373/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.spark.testsuite.dataretention/DataRetentionConcurrencyTestCase/DataRetention_Concurrency_load_date/'><strong>org.apache.carbondata.spark.testsuite.dataretention.DataRetentionConcurrencyTestCase.DataRetention_Concurrency_load_date</strong></a></li><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/373/org.apache.carbondata$carbondata-spark-common-test/testR
eport/org.apache.carbondata.spark.testsuite.bigdecimal/TestDimensionWithDecimalDataType/test_unsafe_with_bigdecimal/'><strong>org.apache.carbondata.spark.testsuite.bigdecimal.TestDimensionWithDecimalDataType.test unsafe with bigdecimal</strong></a></li></ul>

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1008

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2489/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1008: [CARBONDATA-1145] Fix single-pass issue for multi-ta...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1008

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/374/<h2>Failed Tests: <span class='status-failure'>2</span></h2><h3><a name='carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test' /><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/374/org.apache.carbondata$carbondata-spark-common-test/testReport'>carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test</a>: <span class='status-failure'>2</span></h3><ul><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/374/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.spark.testsuite.dataload/TestGlobalSortDataLoad/LOAD_with_DELETE/'><strong>org.apache.carbondata.spark.testsuite.dataload.TestGlobalSortDataLoad.LOAD with DELETE</strong></a></li><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/374/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.spark.testsuite.dataload/TestGlobalSortD
ataLoad/Test_with_different_date_types/'><strong>org.apache.carbondata.spark.testsuite.dataload.TestGlobalSortDataLoad.Test with different date types</strong></a></li></ul>

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

123