[GitHub] incubator-carbondata pull request #638: Carbondata 748

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #638: Carbondata 748

qiuchenjian-2
GitHub user simafengyun opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/638

    Carbondata 748

    use binary search to improve performance according to filter values' order

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/simafengyun/incubator-carbondata CARBONDATA-748

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/638.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #638
   
----
commit 252649eecee6a7b74eef5a7b7f17d58a363c09ea
Author: mayun <mayun@10.100.56.61>
Date:   2017-03-09T05:13:22Z

    use binary search to improve the performance in method
    setFilterdIndexToBitSet

commit c50054fa519cc1004b78941cf88541f7ad838976
Author: mayun <mayun@10.100.56.61>
Date:   2017-03-09T07:51:50Z

    add binary range search and add test case

commit 25839b1425986cc95275b5e628e03d3fa8d19103
Author: mayun <mayun@10.100.56.61>
Date:   2017-03-09T08:08:21Z

    revert previous change

commit 0644946a8bb9877ccdafd96420b091364d126669
Author: mayun <mayun@10.100.56.61>
Date:   2017-03-09T08:38:29Z

    format changed code

commit 516c5541722f12dffe5c709238bbb8a2f64e65dc
Author: mayun <mayun@10.100.56.61>
Date:   2017-03-09T09:09:06Z

    change code format to pass check style

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #638: Carbondata 748

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/638
 
    Build Failed  with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1054/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #638: Carbondata 748

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user simafengyun closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/638


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #638: Carbondata 748

qiuchenjian-2
In reply to this post by qiuchenjian-2
GitHub user simafengyun reopened a pull request:

    https://github.com/apache/incubator-carbondata/pull/638

    Carbondata 748

    use binary search to improve performance according to filter values' order

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/simafengyun/incubator-carbondata CARBONDATA-748

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/638.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #638
   
----
commit 252649eecee6a7b74eef5a7b7f17d58a363c09ea
Author: mayun <mayun@10.100.56.61>
Date:   2017-03-09T05:13:22Z

    use binary search to improve the performance in method
    setFilterdIndexToBitSet

commit c50054fa519cc1004b78941cf88541f7ad838976
Author: mayun <mayun@10.100.56.61>
Date:   2017-03-09T07:51:50Z

    add binary range search and add test case

commit 25839b1425986cc95275b5e628e03d3fa8d19103
Author: mayun <mayun@10.100.56.61>
Date:   2017-03-09T08:08:21Z

    revert previous change

commit 0644946a8bb9877ccdafd96420b091364d126669
Author: mayun <mayun@10.100.56.61>
Date:   2017-03-09T08:38:29Z

    format changed code

commit 516c5541722f12dffe5c709238bbb8a2f64e65dc
Author: mayun <mayun@10.100.56.61>
Date:   2017-03-09T09:09:06Z

    change code format to pass check style

commit 141e26425ed7296b661a5382a4fe168e33fb71d1
Author: mayun <mayun@10.100.56.61>
Date:   2017-03-09T09:51:22Z

    revert the code to use inverted index

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #638: Carbondata 748

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/638
 
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1055/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #638: Carbondata 748

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/638
 
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1056/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/638#discussion_r105404228
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
    @@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d
         return -(low + 1);
       }
     
    +  public static int[] getRangeIndexUsingBinarySearch(
    --- End diff --
   
    Please provide comments this method


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/638#discussion_r105405605
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java ---
    @@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD
         BitSet bitSet = new BitSet(numerOfRows);
         if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
           byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
    -      for (int k = 0; k < filterValues.length; k++) {
    -        for (int j = 0; j < numerOfRows; j++) {
    -          if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) {
    -            bitSet.set(j);
    -          }
    +      for (int i = 0; i < numerOfRows; i++) {
    +
    +        int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length,
    --- End diff --
   
    if `filterValues` size is one then we better avoid this binary search , just compare would be enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/638#discussion_r105406369
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
    @@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d
         return -(low + 1);
       }
     
    +  public static int[] getRangeIndexUsingBinarySearch(
    --- End diff --
   
    There is not much difference between `getFirstIndexUsingBinarySearch` and this method,  I remembered in your last PR you have done binary search even for getting the ranges, what happened to it, did you get any functional or performance issues?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user mayunSaicmotor commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/638#discussion_r105416528
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
    @@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d
         return -(low + 1);
       }
     
    +  public static int[] getRangeIndexUsingBinarySearch(
    --- End diff --
   
    you are right, I  really done binary search even for getting the ranges previously, but yesterday I done performance test and found the  performance is not better than current logic. the binary search range has advantage only under the condition of  data array size is very long  and the repeated data is too much. But usually the data array size is 12000 for a chunk, not too long. So  the binary search range has no advantage and I decide to keep the current logic
   



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user mayunSaicmotor commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/638#discussion_r105422550
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java ---
    @@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD
         BitSet bitSet = new BitSet(numerOfRows);
         if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
           byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
    -      for (int k = 0; k < filterValues.length; k++) {
    -        for (int j = 0; j < numerOfRows; j++) {
    -          if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) {
    -            bitSet.set(j);
    -          }
    +      for (int i = 0; i < numerOfRows; i++) {
    +
    +        int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length,
    --- End diff --
   
    does the below is OK?
   
      private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnDataChunk,
          int numerOfRows) {
        BitSet bitSet = new BitSet(numerOfRows);
        if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
          byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
          for (int i = 0; i < numerOfRows; i++) {
   
            if (filterValues.length > 1) {
              int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length - 1,
                  dimensionColumnDataChunk.getChunkData(i));
   
              if (index >= 0) {
                bitSet.set(i);
              }
            } else if (filterValues.length == 1) {
              if (dimensionColumnDataChunk.compareTo(i, filterValues[0]) == 0) {
                bitSet.set(i);
              }
            } else {
              break;
            }
   
          }
        }
        return bitSet;
      }
   



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #638: [Carbondata 748] use binary search improve ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/638
 
    @mayunSaicmotor  please change "[Carbondata 748] " to "[CARBONDATA-748]" for PR's title.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/638#discussion_r105424505
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java ---
    @@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD
         BitSet bitSet = new BitSet(numerOfRows);
         if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
           byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
    -      for (int k = 0; k < filterValues.length; k++) {
    -        for (int j = 0; j < numerOfRows; j++) {
    -          if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) {
    -            bitSet.set(j);
    -          }
    +      for (int i = 0; i < numerOfRows; i++) {
    +
    +        int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length,
    --- End diff --
   
    looks fine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #638: [CARBONDATA-748] use binary search improve ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/638
 
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1085/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #638: [CARBONDATA-748] use binary search i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user mayunSaicmotor commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/638#discussion_r105429564
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
    @@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d
         return -(low + 1);
       }
     
    +  public static int[] getRangeIndexUsingBinarySearch(
    --- End diff --
   
    comments  was added.  Is there anything else need to change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #638: [CARBONDATA-748] use binary search improve ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/638
 
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #638: [CARBONDATA-748] use binary search i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/638


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #638: [CARBONDATA-748] use binary search i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user mayunSaicmotor commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/638#discussion_r105522667
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java ---
    @@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD
         BitSet bitSet = new BitSet(numerOfRows);
         if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
           byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
    -      for (int k = 0; k < filterValues.length; k++) {
    -        for (int j = 0; j < numerOfRows; j++) {
    -          if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) {
    -            bitSet.set(j);
    -          }
    +      for (int i = 0; i < numerOfRows; i++) {
    +
    +        int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length,
    --- End diff --
   
    @ravipesala, If put the if clause out of the for clause, it is better?
   
    `  private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnDataChunk,
          int numerOfRows) {
        BitSet bitSet = new BitSet(numerOfRows);
        if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
          byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
   
          if (filterValues.length > 1) {
            for (int i = 0; i < numerOfRows; i++) {
              int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length - 1,
                  dimensionColumnDataChunk.getChunkData(i));
   
              if (index >= 0) {
                bitSet.set(i);
              }
            }
          } else if (filterValues.length == 1) {
            for (int i = 0; i < numerOfRows; i++) {
              if (dimensionColumnDataChunk.compareTo(i, filterValues[0]) == 0) {
                bitSet.set(i);
              }
            }
          }
        }
        return bitSet;
      }`
   



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---