Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2169: [CARBONDATA-2344][DataMap] Fix bugs in mappin...

Classic

List

25 messages Options

Options

12

[GitHub] carbondata pull request #2169: [CARBONDATA-2344][DataMap] Fix bugs in mappin...

GitHub user xuchuanyin opened a pull request:

https://github.com/apache/carbondata/pull/2169

[CARBONDATA-2344][DataMap] Fix bugs in mapping blocklet to UnsafeDMStore rows

In BlockletDataMap, carbondata stores DMRow in an array for each
blocklet. But currently carbondata accesses the DMRow only by
blockletId(0, 1, etc.), which will cause problem since different
block can have same blockletId.

This PR adds a map to map the blockId#blockletId to array index,
carbondata can access the DMRow by blockId and blockletId.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [x] Any interfaces changed?
`NO, only internal interfaces have been changed`
- [x] Any backward compatibility impacted?
`NO`
- [x] Document update required?
`NO`
- [x] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
`NO`
- How it is tested? Please attach test report.
`Tested in local`
- Is it a performance related change? Please attach the performance test report.
`No`
- Any additional information to help reviewers in testing this change.
`NO`
- [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
`Not related`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuchuanyin/carbondata 0413_bug_blocklet_dm_unsafe_row

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2169.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2169

----
commit dd010297c7f7428dc8f42ec1a292b8cdddcc09aa
Author: xuchuanyin <xuchuanyin@...>
Date: 2018-04-13T08:18:23Z

Fix bugs in mapping blocklet to UnsafeDMStore

In BlockletDataMap, carbondata stores DMRow in an array for each
blocklet. But currently carbondata accesses the DMRow only by
blockletId(0, 1, etc.), which will cause problem since different
block can have same blockletId.

This PR adds a map to map the blockId#blockletId to array index,
carbondata can access the DMRow by blockId and blockletId.

----

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2169

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3780/

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2169

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4996/

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2169

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4440/

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2169

retest this please

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2169

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4441/

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2169

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5007/

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2169

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3791/

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2169

retest this please

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2169

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3867/

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2169

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5091/

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2169

@xuchuanyin what is the issue you are actually facing? Blocklet ids here are only virtual and count as per the number of blocklets present in the indexfile. If the issue is with other datamaps like lucene then better correct the blocklet order as per the indexfile while writing the datamap. It also saves memory and simplifies the datamap writing by avoiding block name.
Maintaining block names here is not memory efficient.

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2169

@ravipesala Thanks for helping me understand the design purpose.

The origin problem is that I found the query result will duplicate/miss some records. The scenario is that I use a datamap to filter out 2 block (each contains 3 blocklets). When it comes to BlockletDataMap, it filter out 6 blocklets, but the blocklets are duplicated twice. Actually it only contains blocklets from the first block.

I'll work on the relativeBlockletId and fix the problem.

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2169

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3941/

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2169

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5226/

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2169

@ravipesala After I studied the code, I found that we must keep a map between unique-blockletId to DMRow-pointer-index.
The relative blockletId in previous code was generated before datamap pruning and has some relationship with DMRow-pointer-index. After pruning, some blocks will be filtered and we can't get the real relative blocklet since some blocks was filtered.

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2169

retest this please

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2169

@xuchuanyin What I have mentioned is that instead of adding the mapping in datamap, handle while writing the datamap.
Currently the blocklet number is respective to each block while writing the datamap , instead generate blocklet number respective to complete index file.
In this approach, we can eliminate the block to bloclet mapping completely even inside datamaps.

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2169

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4078/

---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2169

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5259/

---

12