[GitHub] carbondata pull request #2725: [WIP] Added code to support storing min max f...

classic Classic list List threaded Threaded
50 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [WIP] Added code to support storing min max f...

qiuchenjian-2
GitHub user manishgupta88 opened a pull request:

    https://github.com/apache/carbondata/pull/2725

    [WIP] Added code to support storing min max for string columns based on number of characacters

   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishgupta88/carbondata string_min_max_decision

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2725.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2725
   
----
commit 6ef6d880e68aa3bfc1c704659c916b384e5d545a
Author: manishgupta88 <tomanishgupta18@...>
Date:   2018-09-14T05:13:20Z

    Modified code to support writing the min max flag for all the columns in the metadata. This will help in deciding whether min max for a column is written or not

commit 4ed24f3972cb4c03133912ba8a1c9d35b3122e7a
Author: manishgupta88 <tomanishgupta18@...>
Date:   2018-09-15T06:20:03Z

    Modified code to support filter query using min max flag for a column

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [WIP] Added code to support storing min max for stri...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/296/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [WIP] Added code to support storing min max for stri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8541/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [WIP] Added code to support storing min max for stri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/471/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [WIP] Added code to support storing min max for stri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/298/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [WIP] Added code to support storing min max for stri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8543/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [WIP] Added code to support storing min max for stri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/473/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [WIP] Added code to support storing min max for stri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/302/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [WIP] Added code to support storing min max for stri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8547/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [WIP] Added code to support storing min max for stri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/477/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/311/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218048262
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java ---
    @@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader fileReader, int columnInde
        * @return
        */
       BitSetGroup getIndexedData();
    +
    +  /**
    +   * Return the array which contains the flag for each column whether the min max for that column
    +   * is written in metadata or not
    +   *
    +   * @return
    +   */
    +  boolean[] isMinMaxSet();
    --- End diff --
   
    Better change name to minMaxFlagArray`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218059894
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java ---
    @@ -133,6 +134,21 @@
         return null;
       }
     
    +  @Override
    +  public boolean[] isMinMaxSet() {
    +    BlockletIndex blockletIndex =
    +        blockInfos.get(index).getDetailInfo().getBlockletInfo().getBlockletIndex();
    --- End diff --
   
    No need to pass this information from driver to executor.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218060271
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/blocklet/index/BlockletIndex.java ---
    @@ -75,4 +80,12 @@ public void setMinMaxIndex(BlockletMinMaxIndex minMaxIndex) {
         this.minMaxIndex = minMaxIndex;
       }
     
    +  @Override public void write(DataOutput out) throws IOException {
    --- End diff --
   
    Please don't serialize as it is not required to pass to executor


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218061484
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java ---
    @@ -108,10 +108,13 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
           BitSetGroup bitSetGroup = new BitSetGroup(dimensionRawColumnChunk.getPagesCount());
           filterValues = dimColumnExecuterInfo.getFilterKeys();
           boolean isDecoded = false;
    +      boolean isMinMaxSetForFilterDimension =
    +          rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex];
           for (int i = 0; i < dimensionRawColumnChunk.getPagesCount(); i++) {
             if (dimensionRawColumnChunk.getMaxValues() != null) {
    -          if (isScanRequired(dimensionRawColumnChunk.getMaxValues()[i],
    -              dimensionRawColumnChunk.getMinValues()[i], dimColumnExecuterInfo.getFilterKeys())) {
    +          if (!isMinMaxSetForFilterDimension || isScanRequired(
    --- End diff --
   
    Please use the minaxflags from the current page and do the scan require check


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218062043
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeGrtThanFiterExecuterImpl.java ---
    @@ -194,9 +198,12 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
           BitSetGroup bitSetGroup = new BitSetGroup(rawColumnChunk.getPagesCount());
           FilterExecuter filterExecuter = null;
           boolean isExclude = false;
    +      boolean isMinMaxSetForFilterDimension =
    +          rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex];
           for (int i = 0; i < rawColumnChunk.getPagesCount(); i++) {
             if (rawColumnChunk.getMaxValues() != null) {
    -          if (isScanRequired(rawColumnChunk.getMaxValues()[i], this.filterRangeValues)) {
    +          if (!isMinMaxSetForFilterDimension || isScanRequired(rawColumnChunk.getMaxValues()[i],
    --- End diff --
   
    Please use the minaxflags from the current page and do the scan require check


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218062180
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java ---
    @@ -122,11 +122,11 @@ public boolean isScanRequired(DataRefNode dataBlock) {
             bitSet = ((ImplicitColumnFilterExecutor) filterExecuter)
                 .isFilterValuesPresentInBlockOrBlocklet(
                     dataBlock.getColumnsMaxValue(),
    -                dataBlock.getColumnsMinValue(), blockletId);
    +                dataBlock.getColumnsMinValue(), blockletId, dataBlock.isMinMaxSet());
           } else {
             bitSet = this.filterExecuter
                 .isScanRequired(dataBlock.getColumnsMaxValue(),
    -                dataBlock.getColumnsMinValue());
    +                dataBlock.getColumnsMinValue(), dataBlock.isMinMaxSet());
    --- End diff --
   
    Please use the minaxflags from the current footer and do the scan require check


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/487/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218066737
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java ---
    @@ -250,6 +275,42 @@ public static BlockletIndex getBlockletIndex(EncodedBlocklet encodedBlocklet,
       }
     
       /**
    +   * This method will combine the writeMinMax flag from all the pages. If any page for a given
    +   * dimension has writeMinMax flag set to false then min max for that dimension will nto be
    +   * written in any of the page and metadata
    +   *
    +   * @param blockletMinMaxIndex
    +   * @param encodedBlocklet
    +   */
    +  private static List<Boolean> mergeWriteMinMaxFlagForAllPages(
    +      BlockletMinMaxIndex blockletMinMaxIndex, EncodedBlocklet encodedBlocklet) {
    +    Boolean[] mergedWriteMinMaxFlag =
    +        new Boolean[encodedBlocklet.getNumberOfDimension() + encodedBlocklet.getNumberOfMeasure()];
    +    // set writeMinMax flag to true for all the columns by default and then update if stats object
    +    // has the this flag set to false
    +    Arrays.fill(mergedWriteMinMaxFlag, true);
    +    for (int i = 0; i < encodedBlocklet.getNumberOfDimension(); i++) {
    +      for (int pageIndex = 0; pageIndex < encodedBlocklet.getNumberOfPages(); pageIndex++) {
    +        EncodedColumnPage encodedColumnPage =
    +            encodedBlocklet.getEncodedDimensionColumnPages().get(i).getEncodedColumnPageList()
    +                .get(pageIndex);
    +        SimpleStatsResult stats = encodedColumnPage.getStats();
    +        if (!stats.writeMinMax()) {
    +          mergedWriteMinMaxFlag[i] = stats.writeMinMax();
    +          String columnName = encodedColumnPage.getActualPage().getColumnSpec().getFieldName();
    +          LOGGER.info("Min Max writing ignored for column " + columnName + " from page 0 to "
    --- End diff --
   
    Add generic message saying that `"Min Max writing of blocklet ignored for column with name " + columnName + "  `


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8557/



---
123