GitHub user simafengyun opened a pull request:
https://github.com/apache/incubator-carbondata/pull/638 Carbondata 748 use binary search to improve performance according to filter values' order You can merge this pull request into a Git repository by running: $ git pull https://github.com/simafengyun/incubator-carbondata CARBONDATA-748 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/638.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #638 ---- commit 252649eecee6a7b74eef5a7b7f17d58a363c09ea Author: mayun <mayun@10.100.56.61> Date: 2017-03-09T05:13:22Z use binary search to improve the performance in method setFilterdIndexToBitSet commit c50054fa519cc1004b78941cf88541f7ad838976 Author: mayun <mayun@10.100.56.61> Date: 2017-03-09T07:51:50Z add binary range search and add test case commit 25839b1425986cc95275b5e628e03d3fa8d19103 Author: mayun <mayun@10.100.56.61> Date: 2017-03-09T08:08:21Z revert previous change commit 0644946a8bb9877ccdafd96420b091364d126669 Author: mayun <mayun@10.100.56.61> Date: 2017-03-09T08:38:29Z format changed code commit 516c5541722f12dffe5c709238bbb8a2f64e65dc Author: mayun <mayun@10.100.56.61> Date: 2017-03-09T09:09:06Z change code format to pass check style ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/638 Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1054/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user simafengyun closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/638 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
GitHub user simafengyun reopened a pull request:
https://github.com/apache/incubator-carbondata/pull/638 Carbondata 748 use binary search to improve performance according to filter values' order You can merge this pull request into a Git repository by running: $ git pull https://github.com/simafengyun/incubator-carbondata CARBONDATA-748 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/638.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #638 ---- commit 252649eecee6a7b74eef5a7b7f17d58a363c09ea Author: mayun <mayun@10.100.56.61> Date: 2017-03-09T05:13:22Z use binary search to improve the performance in method setFilterdIndexToBitSet commit c50054fa519cc1004b78941cf88541f7ad838976 Author: mayun <mayun@10.100.56.61> Date: 2017-03-09T07:51:50Z add binary range search and add test case commit 25839b1425986cc95275b5e628e03d3fa8d19103 Author: mayun <mayun@10.100.56.61> Date: 2017-03-09T08:08:21Z revert previous change commit 0644946a8bb9877ccdafd96420b091364d126669 Author: mayun <mayun@10.100.56.61> Date: 2017-03-09T08:38:29Z format changed code commit 516c5541722f12dffe5c709238bbb8a2f64e65dc Author: mayun <mayun@10.100.56.61> Date: 2017-03-09T09:09:06Z change code format to pass check style commit 141e26425ed7296b661a5382a4fe168e33fb71d1 Author: mayun <mayun@10.100.56.61> Date: 2017-03-09T09:51:22Z revert the code to use inverted index ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/638 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1055/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/638 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1056/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105404228 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d return -(low + 1); } + public static int[] getRangeIndexUsingBinarySearch( --- End diff -- Please provide comments this method --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105405605 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java --- @@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD BitSet bitSet = new BitSet(numerOfRows); if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) { byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys(); - for (int k = 0; k < filterValues.length; k++) { - for (int j = 0; j < numerOfRows; j++) { - if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) { - bitSet.set(j); - } + for (int i = 0; i < numerOfRows; i++) { + + int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length, --- End diff -- if `filterValues` size is one then we better avoid this binary search , just compare would be enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105406369 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d return -(low + 1); } + public static int[] getRangeIndexUsingBinarySearch( --- End diff -- There is not much difference between `getFirstIndexUsingBinarySearch` and this method, I remembered in your last PR you have done binary search even for getting the ranges, what happened to it, did you get any functional or performance issues? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user mayunSaicmotor commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105416528 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d return -(low + 1); } + public static int[] getRangeIndexUsingBinarySearch( --- End diff -- you are right, I really done binary search even for getting the ranges previously, but yesterday I done performance test and found the performance is not better than current logic. the binary search range has advantage only under the condition of data array size is very long and the repeated data is too much. But usually the data array size is 12000 for a chunk, not too long. So the binary search range has no advantage and I decide to keep the current logic --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user mayunSaicmotor commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105422550 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java --- @@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD BitSet bitSet = new BitSet(numerOfRows); if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) { byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys(); - for (int k = 0; k < filterValues.length; k++) { - for (int j = 0; j < numerOfRows; j++) { - if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) { - bitSet.set(j); - } + for (int i = 0; i < numerOfRows; i++) { + + int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length, --- End diff -- does the below is OK? private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnDataChunk, int numerOfRows) { BitSet bitSet = new BitSet(numerOfRows); if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) { byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys(); for (int i = 0; i < numerOfRows; i++) { if (filterValues.length > 1) { int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length - 1, dimensionColumnDataChunk.getChunkData(i)); if (index >= 0) { bitSet.set(i); } } else if (filterValues.length == 1) { if (dimensionColumnDataChunk.compareTo(i, filterValues[0]) == 0) { bitSet.set(i); } } else { break; } } } return bitSet; } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/incubator-carbondata/pull/638 @mayunSaicmotor please change "[Carbondata 748] " to "[CARBONDATA-748]" for PR's title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105424505 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java --- @@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD BitSet bitSet = new BitSet(numerOfRows); if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) { byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys(); - for (int k = 0; k < filterValues.length; k++) { - for (int j = 0; j < numerOfRows; j++) { - if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) { - bitSet.set(j); - } + for (int i = 0; i < numerOfRows; i++) { + + int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length, --- End diff -- looks fine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/638 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1085/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user mayunSaicmotor commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105429564 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d return -(low + 1); } + public static int[] getRangeIndexUsingBinarySearch( --- End diff -- comments was added. Is there anything else need to change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/incubator-carbondata/pull/638 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/638 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user mayunSaicmotor commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105522667 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java --- @@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD BitSet bitSet = new BitSet(numerOfRows); if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) { byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys(); - for (int k = 0; k < filterValues.length; k++) { - for (int j = 0; j < numerOfRows; j++) { - if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) { - bitSet.set(j); - } + for (int i = 0; i < numerOfRows; i++) { + + int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length, --- End diff -- @ravipesala, If put the if clause out of the for clause, it is better? ` private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnDataChunk, int numerOfRows) { BitSet bitSet = new BitSet(numerOfRows); if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) { byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys(); if (filterValues.length > 1) { for (int i = 0; i < numerOfRows; i++) { int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length - 1, dimensionColumnDataChunk.getChunkData(i)); if (index >= 0) { bitSet.set(i); } } } else if (filterValues.length == 1) { for (int i = 0; i < numerOfRows; i++) { if (dimensionColumnDataChunk.compareTo(i, filterValues[0]) == 0) { bitSet.set(i); } } } } return bitSet; }` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |