Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2975: [WIP][CARBONDATA-3145] Read improvement for c...

Classic

List

23 messages Options

Options

12

[GitHub] carbondata pull request #2975: [WIP][CARBONDATA-3145] Read improvement for c...

GitHub user dhatchayani opened a pull request:

https://github.com/apache/carbondata/pull/2975

[WIP][CARBONDATA-3145] Read improvement for complex column pages while querying

**Problem:**
Column page is decoded for getting each row of a complex primitive column.

**Solution:**
Decode a page it once then use the same.

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [x] Testing done
Existing UT

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhatchayani/carbondata CARBONDATA-3145

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2975.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2975

----
commit 2f1f8b0081f3229ef151f8171b19fa76006065fb
Author: dhatchayani <dhatcha.official@...>
Date: 2018-12-05T07:10:56Z

[CARBONDATA-3145] Read improvement for complex column pages while querying

----

---

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2975

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1642/

---

[GitHub] carbondata pull request #2975: [WIP][CARBONDATA-3145] Read improvement for c...

In reply to this post by qiuchenjian-2

Github user qiuchenjian commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2975#discussion_r238971934

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java ---
@@ -29,6 +32,8 @@

protected int blockIndex;

+ protected Map<Integer, DimensionColumnPage> decodedPagesWithPageNumber = new HashMap<>();
--- End diff --

I think this Map will be better to cache the 'byte[] data', because getChunkData also takes time

---

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2975

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9902/

---

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2975

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1853/

---

[GitHub] carbondata pull request #2975: [WIP][CARBONDATA-3145] Read improvement for c...

In reply to this post by qiuchenjian-2

Github user dhatchayani commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2975#discussion_r238988286

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java ---
@@ -29,6 +32,8 @@

protected int blockIndex;

+ protected Map<Integer, DimensionColumnPage> decodedPagesWithPageNumber = new HashMap<>();
--- End diff --

This map is to hold to DimensionColumnPage as we are decoding the same page again and again. 'byte[] data' for each row is the query result, we need to get that from the decodedPage

---

[GitHub] carbondata pull request #2975: [WIP][CARBONDATA-3145] Read improvement for c...

In reply to this post by qiuchenjian-2

Github user qiuchenjian commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2975#discussion_r238989259

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java ---
@@ -29,6 +32,8 @@

protected int blockIndex;

+ protected Map<Integer, DimensionColumnPage> decodedPagesWithPageNumber = new HashMap<>();
--- End diff --

OK, I didn't mention the pageNumber and rowNumber

> This map is to hold to DimensionColumnPage as we are decoding the same page again and again. 'byte[] data' for each row is the query result, we need to get that from the decodedPage

OK, I didn't mention the pageNumber and rowNumber

---

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2975

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1646/

---

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2975

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9906/

---

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/2975

@dhatchayani can u please update the performance result

---

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2975

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1857/

---

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

In reply to this post by qiuchenjian-2

Github user qiuchenjian commented on the issue:

https://github.com/apache/carbondata/pull/2975

Are the decoded DimensionColumnPage are cached in DimensionRawColumnChunk, so the function of cache is used by other code.
Such as, DimensionRawColumnChunk contains a Map<Integer,DimensionColumnPage> cacheDimPages, when a Page is decoded, it will cache in this map.

---

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

In reply to this post by qiuchenjian-2

Github user dhatchayani commented on the issue:

https://github.com/apache/carbondata/pull/2975

@qiuchenjian Actually, DimensionColumnPage is decoded from DimensionRawColumnChunk. So it is meaningless to cache it in DimensionRawColumnChunk. Both serves different purposes.

---

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

In reply to this post by qiuchenjian-2

Github user qiuchenjian commented on the issue:

https://github.com/apache/carbondata/pull/2975

@dhatchayani I read the DimensionRawColumnChunk, this class has cache the decoded DimensionColumnPage, is your Map added useful?

public DimensionColumnPage decodeColumnPage(int pageNumber) {
assert pageNumber < pagesCount;
if (dataChunks == null) {
dataChunks = new DimensionColumnPage[pagesCount];
}
if (dataChunks[pageNumber] == null) {
try {
dataChunks[pageNumber] = chunkReader.decodeColumnPage(this, pageNumber, null);
} catch (IOException | MemoryException e) {
throw new RuntimeException(e);
}
}

return dataChunks[pageNumber];
}

---

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2975

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1671/

---

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2975

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9931/

---

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2975

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1883/

---

[GitHub] carbondata issue #2975: [CARBONDATA-3145] Read improvement for complex colum...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2975

LGTM

---

[GitHub] carbondata issue #2975: [CARBONDATA-3145] Avoid duplicate decoding for compl...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/2975

LGTM.. @ravipesala yeah u are right...Based on PR desc I asked for performance report, now it's okay :)

---

[GitHub] carbondata pull request #2975: [CARBONDATA-3145] Avoid duplicate decoding fo...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2975

---

12