Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756]Fixed RLE Encoding I...

Classic

List

18 messages Options

Options

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756]Fixed RLE Encoding I...

GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/642

[CARBONDATA-756]Fixed RLE Encoding Issue

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata FixedRLEEncodingIssue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/642.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #642

----
commit c899d729a79062bcdc14e932d79e2913c92d9ea4
Author: kumarvishal <[hidden email]>
Date: 2017-03-10T11:13:24Z

Fixed RLE Encoding Issue

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756]Fixed RLE Encoding Issue

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/642

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1074/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756]Fixed RLE Encoding Issue

In reply to this post by qiuchenjian-2

Github user jackylk commented on the issue:

https://github.com/apache/incubator-carbondata/pull/642

Please describe this PR

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756]Fixed RLE Encoding Issue

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/642

Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1077/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756]Fixed RLE Encoding I...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/642#discussion_r105408221

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java ---
@@ -192,12 +192,24 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) {
}
map.add(start);
map.add(counter);
- this.keyBlock = convertToKeyArray(list);
- if (indexes.length == keyBlock.length) {
- dataIndexMap = new short[0];
- } else {
+ boolean useRle = (list.size() > indexes.length
--- End diff --

I guess you can simply as below.
```
boolean useRle = !((((list.size() + map.size()) * 100) / indexes.length) > 70);
```
I think need of `list.size() > indexes.length` is not required as the percentage calculation can include this as well.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756]Fixed RLE Encoding Issue

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/incubator-carbondata/pull/642

@jackylk This PR is regarding RLE encoding of data, It is not good to have RLE if the compressed data is more than 70% of actual data size, it wastes processing. So we enable RLE only if the data is able to compress less than 70%.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756]Fixed RLE Encoding Issue

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/incubator-carbondata/pull/642

please change the title as per the format: [CARBONDATA-issue number>] Title of the pull request (need to add a blank)

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756] Fixed RLE Encoding Issue

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/642

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1095/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756] Fixed RLE Encoding ...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/642#discussion_r105575706

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java ---
@@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) {
}
map.add(start);
map.add(counter);
- this.keyBlock = convertToKeyArray(list);
- if (indexes.length == keyBlock.length) {
- dataIndexMap = new short[0];
- } else {
+ boolean useRle = (((list.size() + map.size()) * 100) / indexes.length) > 70 ? false : true;
--- End diff --

suggest to use:
`(((list.size() + map.size()) * 100) / indexes.length) < 70`

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756] Fixed RLE Encoding ...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/642#discussion_r105576295

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java ---
@@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) {
}
map.add(start);
map.add(counter);
- this.keyBlock = convertToKeyArray(list);
- if (indexes.length == keyBlock.length) {
- dataIndexMap = new short[0];
- } else {
+ boolean useRle = (((list.size() + map.size()) * 100) / indexes.length) > 70 ? false : true;
+ if (useRle) {
+ this.keyBlock = convertToKeyArray(list);
dataIndexMap = convertToArray(map);
+ } else {
+ this.keyBlock = convertToKeyArray(indexes);
+ dataIndexMap = new short[0];
--- End diff --

So we are judging based on whether it is empty array when reading?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756] Fixed RLE Encoding ...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/642#discussion_r105577915

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java ---
@@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) {
}
map.add(start);
map.add(counter);
- this.keyBlock = convertToKeyArray(list);
--- End diff --

This is a comment for `compressMyOwnWay` function, suggest to use `indexes.length / 2` as the initial size to allocate the ArrayList, instead of 10, which is too small and will cause repeated arraylist expansion

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756] Fixed RLE Encoding ...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/642#discussion_r105578013

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java ---
@@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) {
}
map.add(start);
map.add(counter);
- this.keyBlock = convertToKeyArray(list);
- if (indexes.length == keyBlock.length) {
- dataIndexMap = new short[0];
- } else {
+ boolean useRle = (((list.size() + map.size()) * 100) / indexes.length) > 70 ? false : true;
--- End diff --

Can we decide this in a more heuristic way? Like if we find there are more than 5 pages not doing RLE, then do not pay the cost to try to compress it in all future blocklets.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756] Fixed RLE Encoding ...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/642#discussion_r105614538

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java ---
@@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) {
}
map.add(start);
map.add(counter);
- this.keyBlock = convertToKeyArray(list);
- if (indexes.length == keyBlock.length) {
- dataIndexMap = new short[0];
- } else {
+ boolean useRle = (((list.size() + map.size()) * 100) / indexes.length) > 70 ? false : true;
+ if (useRle) {
+ this.keyBlock = convertToKeyArray(list);
dataIndexMap = convertToArray(map);
+ } else {
+ this.keyBlock = convertToKeyArray(indexes);
+ dataIndexMap = new short[0];
--- End diff --

yes If empty array we will not add Rle encoder in data chunk

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756] Fixed RLE Encoding ...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/642#discussion_r105614647

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java ---
@@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) {
}
map.add(start);
map.add(counter);
- this.keyBlock = convertToKeyArray(list);
--- End diff --

This is a old code i will update the same

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756] Fixed RLE Encoding Issue

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/642

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1108/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756] Fixed RLE Encoding Issue

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/642

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1111/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756] Fixed RLE Encoding Issue

In reply to this post by qiuchenjian-2

Github user jackylk commented on the issue:

https://github.com/apache/incubator-carbondata/pull/642

LGTM

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756] Fixed RLE Encoding ...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/642

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---