[GitHub] carbondata pull request #2725: [WIP] Added code to support storing min max f...

classic Classic list List threaded Threaded
50 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218138883
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java ---
    @@ -250,6 +275,42 @@ public static BlockletIndex getBlockletIndex(EncodedBlocklet encodedBlocklet,
       }
     
       /**
    +   * This method will combine the writeMinMax flag from all the pages. If any page for a given
    +   * dimension has writeMinMax flag set to false then min max for that dimension will nto be
    +   * written in any of the page and metadata
    +   *
    +   * @param blockletMinMaxIndex
    +   * @param encodedBlocklet
    +   */
    +  private static List<Boolean> mergeWriteMinMaxFlagForAllPages(
    +      BlockletMinMaxIndex blockletMinMaxIndex, EncodedBlocklet encodedBlocklet) {
    +    Boolean[] mergedWriteMinMaxFlag =
    +        new Boolean[encodedBlocklet.getNumberOfDimension() + encodedBlocklet.getNumberOfMeasure()];
    +    // set writeMinMax flag to true for all the columns by default and then update if stats object
    +    // has the this flag set to false
    +    Arrays.fill(mergedWriteMinMaxFlag, true);
    +    for (int i = 0; i < encodedBlocklet.getNumberOfDimension(); i++) {
    +      for (int pageIndex = 0; pageIndex < encodedBlocklet.getNumberOfPages(); pageIndex++) {
    +        EncodedColumnPage encodedColumnPage =
    +            encodedBlocklet.getEncodedDimensionColumnPages().get(i).getEncodedColumnPageList()
    +                .get(pageIndex);
    +        SimpleStatsResult stats = encodedColumnPage.getStats();
    +        if (!stats.writeMinMax()) {
    +          mergedWriteMinMaxFlag[i] = stats.writeMinMax();
    +          String columnName = encodedColumnPage.getActualPage().getColumnSpec().getFieldName();
    +          LOGGER.info("Min Max writing ignored for column " + columnName + " from page 0 to "
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218139378
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java ---
    @@ -122,11 +122,11 @@ public boolean isScanRequired(DataRefNode dataBlock) {
             bitSet = ((ImplicitColumnFilterExecutor) filterExecuter)
                 .isFilterValuesPresentInBlockOrBlocklet(
                     dataBlock.getColumnsMaxValue(),
    -                dataBlock.getColumnsMinValue(), blockletId);
    +                dataBlock.getColumnsMinValue(), blockletId, dataBlock.isMinMaxSet());
           } else {
             bitSet = this.filterExecuter
                 .isScanRequired(dataBlock.getColumnsMaxValue(),
    -                dataBlock.getColumnsMinValue());
    +                dataBlock.getColumnsMinValue(), dataBlock.isMinMaxSet());
    --- End diff --
   
    For blocklet min max comparison the footer is getting read in executor and the same min max flag information is getting used. The information serialized from driver is not getting used for blocklet min max comparison in executor


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218139409
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeGrtThanFiterExecuterImpl.java ---
    @@ -194,9 +198,12 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
           BitSetGroup bitSetGroup = new BitSetGroup(rawColumnChunk.getPagesCount());
           FilterExecuter filterExecuter = null;
           boolean isExclude = false;
    +      boolean isMinMaxSetForFilterDimension =
    +          rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex];
           for (int i = 0; i < rawColumnChunk.getPagesCount(); i++) {
             if (rawColumnChunk.getMaxValues() != null) {
    -          if (isScanRequired(rawColumnChunk.getMaxValues()[i], this.filterRangeValues)) {
    +          if (!isMinMaxSetForFilterDimension || isScanRequired(rawColumnChunk.getMaxValues()[i],
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218139444
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java ---
    @@ -108,10 +108,13 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
           BitSetGroup bitSetGroup = new BitSetGroup(dimensionRawColumnChunk.getPagesCount());
           filterValues = dimColumnExecuterInfo.getFilterKeys();
           boolean isDecoded = false;
    +      boolean isMinMaxSetForFilterDimension =
    +          rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex];
           for (int i = 0; i < dimensionRawColumnChunk.getPagesCount(); i++) {
             if (dimensionRawColumnChunk.getMaxValues() != null) {
    -          if (isScanRequired(dimensionRawColumnChunk.getMaxValues()[i],
    -              dimensionRawColumnChunk.getMinValues()[i], dimColumnExecuterInfo.getFilterKeys())) {
    +          if (!isMinMaxSetForFilterDimension || isScanRequired(
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218139467
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/blocklet/index/BlockletIndex.java ---
    @@ -75,4 +80,12 @@ public void setMinMaxIndex(BlockletMinMaxIndex minMaxIndex) {
         this.minMaxIndex = minMaxIndex;
       }
     
    +  @Override public void write(DataOutput out) throws IOException {
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218139494
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java ---
    @@ -133,6 +134,21 @@
         return null;
       }
     
    +  @Override
    +  public boolean[] isMinMaxSet() {
    +    BlockletIndex blockletIndex =
    +        blockInfos.get(index).getDetailInfo().getBlockletInfo().getBlockletIndex();
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218139529
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java ---
    @@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader fileReader, int columnInde
        * @return
        */
       BitSetGroup getIndexedData();
    +
    +  /**
    +   * Return the array which contains the flag for each column whether the min max for that column
    +   * is written in metadata or not
    +   *
    +   * @return
    +   */
    +  boolean[] isMinMaxSet();
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    @ravipesala ..Fixed review comments. Kindly review


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/317/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/320/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8563/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/493/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/496/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8566/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218274542
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapRowIndexes.java ---
    @@ -40,12 +40,14 @@
     
       int BLOCK_LENGTH = 8;
     
    +  int BLOCK_MIN_MAX_FLAG = 9;
    +
       // below variables are specific for blockletDataMap
    -  int BLOCKLET_INFO_INDEX = 9;
    +  int BLOCKLET_INFO_INDEX = 10;
    --- End diff --
   
    will this modification affect legacy store?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218273696
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1831,6 +1831,16 @@
     
       public static final short LOCAL_DICT_ENCODED_BYTEARRAY_SIZE = 3;
     
    +  /**
    +   * property to be used for specifying the max character limit for string/varchar data type till
    --- End diff --
   
    ’specifying the max character limit‘ ->'specifying the max byte limit'


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218273764
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java ---
    @@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader fileReader, int columnInde
        * @return
        */
       BitSetGroup getIndexedData();
    +
    +  /**
    +   * Return the array which contains the flag for each column whether the min max for that column
    +   * is written in metadata or not
    +   *
    +   * @return
    --- End diff --
   
    unnecessary ’return‘


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2725
 
    @manishgupta88 Hi, actually I have another idea on the implementation for this requirement. I'd like to use an unique fake value to store the minmax for those columns.
    While judging the filter value with the fake minmax, we know that is a fake minmax, so the filter procedure just returns true.
    In this way, we do not need to modify the metadata and can reduce many code changes.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218298749
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapRowIndexes.java ---
    @@ -40,12 +40,14 @@
     
       int BLOCK_LENGTH = 8;
     
    +  int BLOCK_MIN_MAX_FLAG = 9;
    +
       // below variables are specific for blockletDataMap
    -  int BLOCKLET_INFO_INDEX = 9;
    +  int BLOCKLET_INFO_INDEX = 10;
    --- End diff --
   
    No it will not have any impact as the change this code is in query part


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2725#discussion_r218298762
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1831,6 +1831,16 @@
     
       public static final short LOCAL_DICT_ENCODED_BYTEARRAY_SIZE = 3;
     
    +  /**
    +   * property to be used for specifying the max character limit for string/varchar data type till
    --- End diff --
   
    ok


---
123