Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

Classic

List

Threaded

62 messages Options

1234

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2665

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/637/

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2665

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8707/

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2665

I've verified this PR for another bug and it works fine.
see http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Issue-Bloomfilter-datamap-td63254.html

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2665

LGTM

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2665

@xuchuanyin In general if a datamap is created with a list of columns for suppose c1 and c2 then we create an index datamap on c1 and c2 and treat that as a composite index. So if user queries with filters on both the columns then indexdatamap should combine both and make a single call of prune of that datamap because user has created composite datamap.

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2665

@ravipesala
It's a good idea to do multiple pruning at one time, but it also has shortcomings.

1. The datamap has to deal with the AND/OR logic of the composite expressions. If the expressions are AND, it needs to do intersection of the pruned result; If the expressions are OR, it needs to do union of the pruned result. It will makes the datamap complicated to handle.

2. Besides, you can see it from the maillist that, current expressions forwarded to BloomDataMap contains too much unwanted expressions.

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2665

@xuchuanyin when user creates datamap on multiple columns then it should be composite index. But bloom creates index for individual columns even though user creates datamap on multiple columns. If user wants index for individual columns then he should create multiple datamaps not a single datamap.

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2665

@ravipesala What I want to say is that even if we support composite index, the current datamap chooser still failed to work, because it composites irrelevant expressions and forwards that to the datamap ignoring that the expression contains non-index columns. You can see it from the logs in the maillist that currently we have to deal 27 expressions while after applying this patch we only need to deal 4 expressions.

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2665

@xuchuanyin Datamap expression merging is the basic framework feature but not tested, so we might expect some issues from it as when it was implemented there were no index datamaps implemented. So we should fix those issues instead of removing it. It changes the very nature of index datamap itself if you remove it.
The expected feature should be as follows.
When the user creates index datamap on multiple columns then it should be treated as a composite index, not an individual index.
The user should create individual datamaps on the columns if he wants an individual index on the column.

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2665

@ravipesala I do think assembling expressions and forward them to the corresponding datamap is a good idea. But current implementation needs more work to achieve this goal. We can not just pass the expressions to datamaps and let the datamap do the dirty work to do intersection or union of each expression.

As you mentioned we can create multiple individual datamaps, this will also be a problem now. For example bloomfilter datamap only support 'equalTo' and 'in', but current DataMapChooser will forward 'NotEqual','Greater','Less'... to the datamap also, which will cause performance waste.

Besides, for both BloomFilter datamap and Lucene datamap, we never treat specifying multiple columns in 'index_columns' as composite index, actually we just index them individually.

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2665

In addition, if the expression forwarded to bloomfilter datamap is `indexCol1 = 1 AND indexCol2 = 2`, bloomfilter datamap will prune the blocklets separately and giving result sets `blockletSet1` and `blockletSet2`. The bloomfilter has to do intersection of the two result sets based on the 'AND' relation of the two expressions. ---- That's what I called 'dirty work'.

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2665

@ravipesala At last, I think current PR:
1. do fix the bug
2. has limited impact on other features (only impact bloomfilter datamap)
and it can be accepted.

Maybe in the next version, we can optimize the frame work by
1. assembling expression that exactly can be supported by the index datamap
2. supporting pipeline pruning (use a bitmap to indicate the hitted blocklet number in previous pruning procedure and use the bitmap in current pruning procedure)

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2665

@xuchuanyin Please check my fix in the PR https://github.com/apache/carbondata/pull/2767

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2665

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/536/

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2665

@ravipesala
@kevinjmh will check your PR

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2665

@kevinjmh I accepted another PR #2767 which also intends to fix this problem.
You can close this PR now.
Thanks for your working all the same!

---

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

In reply to this post by qiuchenjian-2

1234