[GitHub] carbondata pull request #1624: [WIP][CARBONDATA-1867] Add support for task/s...

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1624: [WIP][CARBONDATA-1867] Add support for task/s...

qiuchenjian-2
GitHub user manishgupta88 opened a pull request:

    https://github.com/apache/carbondata/pull/1624

    [WIP][CARBONDATA-1867] Add support for task/segment level pruning

    Added support for task/segment level pruning. Added code to compute task level min/max which can be helpful for task/segment level pruning
   
    Note: Dependent on PR #1619 ..Need to be rebased once PR 1619 is merged.
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishgupta88/carbondata add_task_level_min_max

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1624.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1624
   
----
commit 5ada233b0e96baf9864407d6c9364495315ca6bc
Author: manishgupta88 <[hidden email]>
Date:   2017-12-06T11:41:51Z

    Added support for task/segment level pruning. Added code to compute task level min/max which can be helpful for task/segment level pruning

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1624: [WIP][CARBONDATA-1867] Add support for task/segment ...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1624
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/499/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1624: [WIP][CARBONDATA-1867] Add support for task/segment ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1624
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1762/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1624: [WIP][CARBONDATA-1867] Add support for task/s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1624#discussion_r155232361
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -165,4 +165,22 @@ public DataMapFactory getDataMapFactory() {
       @Override public void onEvent(Event event, OperationContext opContext) {
         dataMapFactory.fireEvent(event);
       }
    +
    +  /**
    +   * Method to check whether the current segment need to be scanned or not for the queried data
    +   *
    +   * @param segmentId
    +   * @param filterExp
    +   * @return
    +   * @throws IOException
    +   */
    +  public boolean isScanRequired(String segmentId, FilterResolverIntf filterExp) throws IOException {
    --- End diff --
   
    It should not be `isScanRequired`, it should be pruneSegments that should take segmentslist and filterexp, and returns the pruned list of segments


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1624: [WIP][CARBONDATA-1867] Add support for task/s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1624#discussion_r155232944
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/UnsafeMemoryDMStore.java ---
    @@ -101,6 +103,25 @@ public void addIndexRowToUnsafe(DataMapRow indexRow) throws MemoryException {
         pointers[rowCount++] = pointer;
       }
     
    +  /**
    +   * Add the task min/max row to unsafe.
    +   *
    +   * @param taskMinMaxRow
    +   * @return
    +   */
    +  public void addTaskMinMaxRowToUnsafe(DataMapRow taskMinMaxRow, List<Integer> indexesToAccess)
    --- End diff --
   
    why it is required a new method? why don't use `addToUnsafe`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1624: [WIP][CARBONDATA-1867] Add support for task/s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1624#discussion_r155234625
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMap.java ---
    @@ -269,6 +302,47 @@ private DataMapRow addMinMax(int[] minMaxLen, CarbonRowSchema carbonRowSchema,
         return minRow;
       }
     
    +  /**
    +   * This method will compute min/max values at task level
    +   *
    +   * @param taskMinMaxRow
    +   * @param minMaxLen
    +   * @param carbonRowSchema
    +   * @param minMaxValue
    +   * @param ordinal
    +   * @param isMinValueComparison
    +   */
    +  private void addTaskMinMaxValues(DataMapRow taskMinMaxRow, int[] minMaxLen,
    +      CarbonRowSchema carbonRowSchema, byte[][] minMaxValue, int ordinal,
    +      boolean isMinValueComparison) {
    +    DataMapRow row = taskMinMaxRow.getRow(ordinal);
    +    byte[][] updatedMinMaxValues = minMaxValue;
    +    if (null == row) {
    +      CarbonRowSchema[] minSchemas =
    +          ((CarbonRowSchema.StructCarbonRowSchema) carbonRowSchema).getChildSchemas();
    +      row = new DataMapRowImpl(minSchemas);
    +    } else {
    +      byte[][] existingMinMaxValues = getMinMaxValue(taskMinMaxRow, ordinal);
    +      // Compare and update min max values
    +      SerializableComparator comparator =
    --- End diff --
   
    I think you can use `UnsafeComparer.compare`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1624: [WIP][CARBONDATA-1867] Add support for task/s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1624#discussion_r155236681
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMap.java ---
    @@ -178,6 +191,26 @@ private void loadToUnsafe(DataFileFooter fileFooter, SegmentProperties segmentPr
             throw new RuntimeException(e);
           }
         }
    +    // write the task level min/max row to unsafe memory store
    +    if (null != taskMinMaxRow) {
    +      addRowToUnsafeMemoryStore(taskMinMaxRow);
    +    }
    +  }
    +
    +  private void addRowToUnsafeMemoryStore(DataMapRow row) {
    --- End diff --
   
    I think better set some junk or default values exept for the minmax in case of last row. so that it will be easy to read and write


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1624: [WIP][CARBONDATA-1867] Add support for task/segment ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1624
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2145/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1624: [CARBONDATA-1867] Add support for task/segment level...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1624
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/523/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1624: [CARBONDATA-1867] Add support for task/segment level...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/1624
 
    @ravipesala ..handled review comments..kindly review and merge


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1624: [CARBONDATA-1867] Add support for task/segment level...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1624
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/524/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1624: [CARBONDATA-1867] Add support for task/segment level...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1624
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1780/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1624: [CARBONDATA-1867] Add support for task/segment level...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1624
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2155/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1624: [CARBONDATA-1867] Add support for task/segment level...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1624
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2156/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1624: [CARBONDATA-1867] Add support for task/segment level...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1624
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/549/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1624: [CARBONDATA-1867] Add support for task/segment level...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1624
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1807/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1624: [CARBONDATA-1867] Add support for task/segment level...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1624
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2179/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1624: [CARBONDATA-1867] Add support for task/segment level...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1624
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1624: [CARBONDATA-1867] Add support for task/segmen...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1624


---