[
https://issues.apache.org/jira/browse/CARBONDATA-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yadong Qi resolved CARBONDATA-844.
----------------------------------
Resolution: Fixed
Fix Version/s: 1.1.0
> Avoid to get useless splits
> ---------------------------
>
> Key: CARBONDATA-844
> URL:
https://issues.apache.org/jira/browse/CARBONDATA-844> Project: CarbonData
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.1.0
> Reporter: Yadong Qi
> Assignee: Yadong Qi
> Fix For: 1.1.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> In current implements of CarbonInputFormat.getDataBlocksOfSegment,
> 1. Get all of the carbondata splits in segments directory.
> 2. Read the carbonindex and construct the B-tree.
> 3. Apply filter and get matching splits.
> I think we get some useless splits and the operator of getSplits is expensive. So we'd better to do the getSplits after filter:
> 1. List the segment directory, and filter the path of carbonindex.
> 2. Read the carbonindex and construct the B-tree.
> 3. Apply filter and get matching blocks.
> 4. Get carbondata splits from filtered blocks.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)