Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

Classic

List

46 messages Options

Options

123

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

GitHub user xuchuanyin opened a pull request:

https://github.com/apache/carbondata/pull/2410

[CARBONDATA-2650][Datamap] Fix bugs in negative number of skipped blocklets

Currently in carbondata, default blocklet datamap will be used to prune
blocklets. Then other indexdatamap will be used.
But the other index datamap works for segment scope, which in some
scenarios, the size of pruned result will be bigger than that of default
datamap, thus causing negative number of skipped blocklets in explain
query output.

Here we add intersection after pruning. If the pruned result size is
zero, we will finish the pruning.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [x] Any interfaces changed?
`NO`
- [x] Any backward compatibility impacted?
`NO`
- [ ] Document update required?
`NO`
- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
`NO, tests will be added in another PR`
- How it is tested? Please attach test report.
`Tested in local`
- Is it a performance related change? Please attach the performance test report.
`NO`
- Any additional information to help reviewers in testing this change.
`NA`
- [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
`NA`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuchuanyin/carbondata issue_2650_negative_skipped_blocklets

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2410.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2410

----
commit a6508af91211c2b005a33bed4db2a7964bb27af6
Author: xuchuanyin <xuchuanyin@...>
Date: 2018-06-26T06:11:17Z

Fix bugs in negative number of skipped blocklets

Currently in carbondata, default blocklet datamap will be used to prune
blocklets. Then other indexdatamap will be used.
But the other index datamap works for segment scope, which in some
scenarios, the size of pruned result will be bigger than that of default
datamap, thus causing negative number of skipped blocklets in explain
query output.

Here we add intersection after pruning. If the pruned result size is
zero, we will finish the pruning.

----

---

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2410

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6546/

---

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2410

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5375/

---

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2410

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5449/

---

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2410

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6560/

---

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2410

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5392/

---

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2410

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5466/

---

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2410

retest this please

---

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2410

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6573/

---

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2410

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5402/

---

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2410

retest this please

---

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2410

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5449/

---

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2410#discussion_r199047095

--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) {
// First prune using default datamap on driver side.
DataMapExprWrapper dataMapExprWrapper = DataMapChooser
.getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
- List<ExtendedBlocklet> prunedBlocklets =
+ List<ExtendedBlocklet> finalPrunedBlocklets =
--- End diff --

It is better to give a different name, initially it is for main index pruning

---

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2410#discussion_r199047134

--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) {
// First prune using default datamap on driver side.
DataMapExprWrapper dataMapExprWrapper = DataMapChooser
.getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
- List<ExtendedBlocklet> prunedBlocklets =
+ List<ExtendedBlocklet> finalPrunedBlocklets =
dataMapExprWrapper.prune(segmentIds, partitionsToPrune);
-
ExplainCollector.recordDefaultDataMapPruning(
- dataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
+ dataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
+ if (finalPrunedBlocklets.size() == 0) {
+ return finalPrunedBlocklets;
+ }

DataMapChooser chooser = new DataMapChooser(getOrCreateCarbonTable(job.getConfiguration()));

// Get the available CG datamaps and prune further.
DataMapExprWrapper cgDataMapExprWrapper = chooser.chooseCGDataMap(resolver);
if (cgDataMapExprWrapper != null) {
// Prune segments from already pruned blocklets
- pruneSegments(segmentIds, prunedBlocklets);
+ pruneSegments(segmentIds, finalPrunedBlocklets);
+ List<ExtendedBlocklet> cgPrunedBlocklets = new ArrayList<>();
// Again prune with CG datamap.
if (distributedCG && dataMapJob != null) {
- prunedBlocklets = DataMapUtil
+ cgPrunedBlocklets = DataMapUtil
.executeDataMapJob(carbonTable, resolver, segmentIds, cgDataMapExprWrapper, dataMapJob,
--- End diff --

Can you move the function name to previous line to make it more nice formatted

---

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2410#discussion_r199047302

--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) {
// First prune using default datamap on driver side.
DataMapExprWrapper dataMapExprWrapper = DataMapChooser
.getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
- List<ExtendedBlocklet> prunedBlocklets =
+ List<ExtendedBlocklet> finalPrunedBlocklets =
dataMapExprWrapper.prune(segmentIds, partitionsToPrune);
-
ExplainCollector.recordDefaultDataMapPruning(
- dataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
+ dataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
+ if (finalPrunedBlocklets.size() == 0) {
+ return finalPrunedBlocklets;
+ }

DataMapChooser chooser = new DataMapChooser(getOrCreateCarbonTable(job.getConfiguration()));

// Get the available CG datamaps and prune further.
DataMapExprWrapper cgDataMapExprWrapper = chooser.chooseCGDataMap(resolver);
if (cgDataMapExprWrapper != null) {
// Prune segments from already pruned blocklets
- pruneSegments(segmentIds, prunedBlocklets);
+ pruneSegments(segmentIds, finalPrunedBlocklets);
+ List<ExtendedBlocklet> cgPrunedBlocklets = new ArrayList<>();
// Again prune with CG datamap.
if (distributedCG && dataMapJob != null) {
- prunedBlocklets = DataMapUtil
+ cgPrunedBlocklets = DataMapUtil
.executeDataMapJob(carbonTable, resolver, segmentIds, cgDataMapExprWrapper, dataMapJob,
partitionsToPrune);
} else {
- prunedBlocklets = cgDataMapExprWrapper.prune(segmentIds, partitionsToPrune);
+ cgPrunedBlocklets = cgDataMapExprWrapper.prune(segmentIds, partitionsToPrune);
}
-
+ // since index datamap prune in segment scope,
+ // the result need to intersect with previous pruned result
+ finalPrunedBlocklets = (List) CollectionUtils.intersection(
+ finalPrunedBlocklets, cgPrunedBlocklets);
ExplainCollector.recordCGDataMapPruning(
- cgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
+ cgDataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
+ }
+
+ if (finalPrunedBlocklets.size() == 0) {
+ return finalPrunedBlocklets;
}
// Now try to prune with FG DataMap.
if (isFgDataMapPruningEnable(job.getConfiguration()) && dataMapJob != null) {
DataMapExprWrapper fgDataMapExprWrapper = chooser.chooseFGDataMap(resolver);
if (fgDataMapExprWrapper != null) {
// Prune segments from already pruned blocklets
- pruneSegments(segmentIds, prunedBlocklets);
- prunedBlocklets = DataMapUtil
+ pruneSegments(segmentIds, finalPrunedBlocklets);
+ List<ExtendedBlocklet> fgPrunedBlocklets = DataMapUtil
.executeDataMapJob(carbonTable, resolver, segmentIds, fgDataMapExprWrapper, dataMapJob,
partitionsToPrune);
-
+ finalPrunedBlocklets = (List) CollectionUtils.intersection(
+ finalPrunedBlocklets, fgPrunedBlocklets);
ExplainCollector.recordFGDataMapPruning(
- fgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
+ fgDataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
}
} // TODO: add a else branch to push FGDataMap pruning to reader side
--- End diff --

This TODO can be removed now

---

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2410#discussion_r199310661

--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) {
// First prune using default datamap on driver side.
DataMapExprWrapper dataMapExprWrapper = DataMapChooser
.getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
- List<ExtendedBlocklet> prunedBlocklets =
+ List<ExtendedBlocklet> finalPrunedBlocklets =
--- End diff --

fixed. Use the origin variable name 'prunedBlocklets' to represent the final output

---

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2410#discussion_r199310663

--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) {
// First prune using default datamap on driver side.
DataMapExprWrapper dataMapExprWrapper = DataMapChooser
.getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
- List<ExtendedBlocklet> prunedBlocklets =
+ List<ExtendedBlocklet> finalPrunedBlocklets =
dataMapExprWrapper.prune(segmentIds, partitionsToPrune);
-
ExplainCollector.recordDefaultDataMapPruning(
- dataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
+ dataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
+ if (finalPrunedBlocklets.size() == 0) {
+ return finalPrunedBlocklets;
+ }

DataMapChooser chooser = new DataMapChooser(getOrCreateCarbonTable(job.getConfiguration()));

// Get the available CG datamaps and prune further.
DataMapExprWrapper cgDataMapExprWrapper = chooser.chooseCGDataMap(resolver);
if (cgDataMapExprWrapper != null) {
// Prune segments from already pruned blocklets
- pruneSegments(segmentIds, prunedBlocklets);
+ pruneSegments(segmentIds, finalPrunedBlocklets);
+ List<ExtendedBlocklet> cgPrunedBlocklets = new ArrayList<>();
// Again prune with CG datamap.
if (distributedCG && dataMapJob != null) {
- prunedBlocklets = DataMapUtil
+ cgPrunedBlocklets = DataMapUtil
.executeDataMapJob(carbonTable, resolver, segmentIds, cgDataMapExprWrapper, dataMapJob,
--- End diff --

fixed

---

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2410#discussion_r199310665

--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) {
// First prune using default datamap on driver side.
DataMapExprWrapper dataMapExprWrapper = DataMapChooser
.getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
- List<ExtendedBlocklet> prunedBlocklets =
+ List<ExtendedBlocklet> finalPrunedBlocklets =
dataMapExprWrapper.prune(segmentIds, partitionsToPrune);
-
ExplainCollector.recordDefaultDataMapPruning(
- dataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
+ dataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
+ if (finalPrunedBlocklets.size() == 0) {
+ return finalPrunedBlocklets;
+ }

DataMapChooser chooser = new DataMapChooser(getOrCreateCarbonTable(job.getConfiguration()));

// Get the available CG datamaps and prune further.
DataMapExprWrapper cgDataMapExprWrapper = chooser.chooseCGDataMap(resolver);
if (cgDataMapExprWrapper != null) {
// Prune segments from already pruned blocklets
- pruneSegments(segmentIds, prunedBlocklets);
+ pruneSegments(segmentIds, finalPrunedBlocklets);
+ List<ExtendedBlocklet> cgPrunedBlocklets = new ArrayList<>();
// Again prune with CG datamap.
if (distributedCG && dataMapJob != null) {
- prunedBlocklets = DataMapUtil
+ cgPrunedBlocklets = DataMapUtil
.executeDataMapJob(carbonTable, resolver, segmentIds, cgDataMapExprWrapper, dataMapJob,
partitionsToPrune);
} else {
- prunedBlocklets = cgDataMapExprWrapper.prune(segmentIds, partitionsToPrune);
+ cgPrunedBlocklets = cgDataMapExprWrapper.prune(segmentIds, partitionsToPrune);
}
-
+ // since index datamap prune in segment scope,
+ // the result need to intersect with previous pruned result
+ finalPrunedBlocklets = (List) CollectionUtils.intersection(
+ finalPrunedBlocklets, cgPrunedBlocklets);
ExplainCollector.recordCGDataMapPruning(
- cgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
+ cgDataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
+ }
+
+ if (finalPrunedBlocklets.size() == 0) {
+ return finalPrunedBlocklets;
}
// Now try to prune with FG DataMap.
if (isFgDataMapPruningEnable(job.getConfiguration()) && dataMapJob != null) {
DataMapExprWrapper fgDataMapExprWrapper = chooser.chooseFGDataMap(resolver);
if (fgDataMapExprWrapper != null) {
// Prune segments from already pruned blocklets
- pruneSegments(segmentIds, prunedBlocklets);
- prunedBlocklets = DataMapUtil
+ pruneSegments(segmentIds, finalPrunedBlocklets);
+ List<ExtendedBlocklet> fgPrunedBlocklets = DataMapUtil
.executeDataMapJob(carbonTable, resolver, segmentIds, fgDataMapExprWrapper, dataMapJob,
partitionsToPrune);
-
+ finalPrunedBlocklets = (List) CollectionUtils.intersection(
+ finalPrunedBlocklets, fgPrunedBlocklets);
ExplainCollector.recordFGDataMapPruning(
- fgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
+ fgDataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
}
} // TODO: add a else branch to push FGDataMap pruning to reader side
--- End diff --

fixed

---

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2410

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5531/

---

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2410

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6676/

---

123