Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1002: [CARBONDATA-1136] Fix compaction bug for the ...

Classic

List

20 messages Options

Options

[GitHub] carbondata pull request #1002: [CARBONDATA-1136] Fix compaction bug for the ...

GitHub user QiangCai opened a pull request:

https://github.com/apache/carbondata/pull/1002

[CARBONDATA-1136] Fix compaction bug for the partition table

After the compaction of the partition table, the select query is not showing data.

**Analyze**
During compaction, we lost the partition id of table

**Solution**
Continue to use the old partition id in CarbonMergerRDD.scala

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/QiangCai/carbondata fixCompactionIssue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1002.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1002

----
commit e05c696900920ed5b98e608305d49c17d192fb5b
Author: QiangCai <[hidden email]>
Date: 2017-06-07T03:51:08Z

fix compact bug for partition table

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1002

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2247/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1002

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/120/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1002: [CARBONDATA-1136] Fix compaction bug for the ...

In reply to this post by qiuchenjian-2

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1002#discussion_r120927111

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala ---
@@ -405,11 +411,16 @@ class CarbonMergerRDD[K, V](
NodeInfo(splitsPerNode.getTaskId, splitsPerNode.getCarbonInputSplitList.size()))

if (blockletCount != 0) {
+ val taskInfo = splitInfo.asInstanceOf[CarbonInputSplitTaskInfo]
val multiBlockSplit = new CarbonMultiBlockSplit(absoluteTableIdentifier,
- splitInfo.asInstanceOf[CarbonInputSplitTaskInfo].getCarbonInputSplitList,
+ taskInfo.getCarbonInputSplitList,
Array(nodeName))
- result.add(new CarbonSparkPartition(id, partitionNo, multiBlockSplit))
- partitionNo += 1
+ if (isPartitionTable) {
--- End diff --

This handling will not be sufficient,
When number of partitions(Example:100) is not equal to number of nodes(Example:5) , getPartitions will divide total blocks among available nodes. Then each node will get more than one taskno/partitionNo to handle.
Compute function in executor just merges all the given btrees(segid+taskid) into one task. So multiple taskids/partitions will be merged to one. This disturbs partition mapping.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1002: [CARBONDATA-1136] Fix compaction bug for the ...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1002#discussion_r121036463

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala ---
@@ -405,11 +411,16 @@ class CarbonMergerRDD[K, V](
NodeInfo(splitsPerNode.getTaskId, splitsPerNode.getCarbonInputSplitList.size()))

if (blockletCount != 0) {
+ val taskInfo = splitInfo.asInstanceOf[CarbonInputSplitTaskInfo]
val multiBlockSplit = new CarbonMultiBlockSplit(absoluteTableIdentifier,
- splitInfo.asInstanceOf[CarbonInputSplitTaskInfo].getCarbonInputSplitList,
+ taskInfo.getCarbonInputSplitList,
Array(nodeName))
- result.add(new CarbonSparkPartition(id, partitionNo, multiBlockSplit))
- partitionNo += 1
+ if (isPartitionTable) {
--- End diff --

@gvramana right, each node will get more than one taskno/partitionNo to handle. But one spark task just handle one partitionNo/taskNo. a CarbonInputSplitTaskInfo represent a taskNo. So different taskNo will go to different spark task.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/1002

@gvramana I will raise another PR to optimize the compaction for normal table.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/1002

retest this please

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1002

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2472/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1002

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/356/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user gvramana commented on the issue:

https://github.com/apache/carbondata/pull/1002

retest this please

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1002

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2522/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1002

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/410/<h2>Failed Tests: <span class='status-failure'>1</span></h2><h3><a name='carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test' /><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/410/org.apache.carbondata$carbondata-spark-common-test/testReport'>carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test</a>: <span class='status-failure'>1</span></h3><ul><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/410/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.spark.testsuite.allqueries/InsertIntoCarbonTableTestCase/insert_into_carbon_table_from_carbon_table_union_query/'><strong>org.apache.carbondata.spark.testsuite.allqueries.InsertIntoCarbonTableTestCase.insert into carbon table from carbon table union query</strong></a></li></ul>

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1002

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2524/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1002

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/412/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user gvramana commented on the issue:

https://github.com/apache/carbondata/pull/1002

retest this please

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user gvramana commented on the issue:

https://github.com/apache/carbondata/pull/1002

LGTM

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1002

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/208/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1002

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2787/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1002: [CARBONDATA-1136] Fix compaction bug for the ...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1002

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1002

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/721/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---