GitHub user kumarvishal09 opened a pull request:
https://github.com/apache/incubator-carbondata/pull/642 [CARBONDATA-756]Fixed RLE Encoding Issue You can merge this pull request into a Git repository by running: $ git pull https://github.com/kumarvishal09/incubator-carbondata FixedRLEEncodingIssue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/642.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #642 ---- commit c899d729a79062bcdc14e932d79e2913c92d9ea4 Author: kumarvishal <[hidden email]> Date: 2017-03-10T11:13:24Z Fixed RLE Encoding Issue ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/642 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1074/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/642 Please describe this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/642 Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1077/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/642#discussion_r105408221 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java --- @@ -192,12 +192,24 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) { } map.add(start); map.add(counter); - this.keyBlock = convertToKeyArray(list); - if (indexes.length == keyBlock.length) { - dataIndexMap = new short[0]; - } else { + boolean useRle = (list.size() > indexes.length --- End diff -- I guess you can simply as below. ``` boolean useRle = !((((list.size() + map.size()) * 100) / indexes.length) > 70); ``` I think need of `list.size() > indexes.length` is not required as the percentage calculation can include this as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/incubator-carbondata/pull/642 @jackylk This PR is regarding RLE encoding of data, It is not good to have RLE if the compressed data is more than 70% of actual data size, it wastes processing. So we enable RLE only if the data is able to compress less than 70%. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/incubator-carbondata/pull/642 please change the title as per the format: [CARBONDATA-issue number>] Title of the pull request (need to add a blank) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/642 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1095/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/642#discussion_r105575706 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java --- @@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) { } map.add(start); map.add(counter); - this.keyBlock = convertToKeyArray(list); - if (indexes.length == keyBlock.length) { - dataIndexMap = new short[0]; - } else { + boolean useRle = (((list.size() + map.size()) * 100) / indexes.length) > 70 ? false : true; --- End diff -- suggest to use: `(((list.size() + map.size()) * 100) / indexes.length) < 70` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/642#discussion_r105576295 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java --- @@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) { } map.add(start); map.add(counter); - this.keyBlock = convertToKeyArray(list); - if (indexes.length == keyBlock.length) { - dataIndexMap = new short[0]; - } else { + boolean useRle = (((list.size() + map.size()) * 100) / indexes.length) > 70 ? false : true; + if (useRle) { + this.keyBlock = convertToKeyArray(list); dataIndexMap = convertToArray(map); + } else { + this.keyBlock = convertToKeyArray(indexes); + dataIndexMap = new short[0]; --- End diff -- So we are judging based on whether it is empty array when reading? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/642#discussion_r105577915 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java --- @@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) { } map.add(start); map.add(counter); - this.keyBlock = convertToKeyArray(list); --- End diff -- This is a comment for `compressMyOwnWay` function, suggest to use `indexes.length / 2` as the initial size to allocate the ArrayList, instead of 10, which is too small and will cause repeated arraylist expansion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/642#discussion_r105578013 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java --- @@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) { } map.add(start); map.add(counter); - this.keyBlock = convertToKeyArray(list); - if (indexes.length == keyBlock.length) { - dataIndexMap = new short[0]; - } else { + boolean useRle = (((list.size() + map.size()) * 100) / indexes.length) > 70 ? false : true; --- End diff -- Can we decide this in a more heuristic way? Like if we find there are more than 5 pages not doing RLE, then do not pay the cost to try to compress it in all future blocklets. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/642#discussion_r105614538 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java --- @@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) { } map.add(start); map.add(counter); - this.keyBlock = convertToKeyArray(list); - if (indexes.length == keyBlock.length) { - dataIndexMap = new short[0]; - } else { + boolean useRle = (((list.size() + map.size()) * 100) / indexes.length) > 70 ? false : true; + if (useRle) { + this.keyBlock = convertToKeyArray(list); dataIndexMap = convertToArray(map); + } else { + this.keyBlock = convertToKeyArray(indexes); + dataIndexMap = new short[0]; --- End diff -- yes If empty array we will not add Rle encoder in data chunk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/642#discussion_r105614647 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java --- @@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) { } map.add(start); map.add(counter); - this.keyBlock = convertToKeyArray(list); --- End diff -- This is a old code i will update the same --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/642 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1108/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/642 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1111/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/642 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/642 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |