[
https://issues.apache.org/jira/browse/CARBONDATA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacky Li resolved CARBONDATA-2309.
----------------------------------
Resolution: Fixed
Fix Version/s: 1.4.1
1.5.0
> Add strategy to generate bigger carbondata files in case of small amount of data
> --------------------------------------------------------------------------------
>
> Key: CARBONDATA-2309
> URL:
https://issues.apache.org/jira/browse/CARBONDATA-2309> Project: CarbonData
> Issue Type: Improvement
> Components: data-load
> Reporter: xuchuanyin
> Assignee: wangsen
> Priority: Major
> Fix For: 1.5.0, 1.4.1
>
> Time Spent: 3h 20m
> Remaining Estimate: 0h
>
> In some scenario, the input amount of loading data is small, but carbondata still distribute them to each executors (nodes) to do local-sort, thus resulting to small carbondata files generated by each executor.
> In some extreme conditions, if the cluster is big enough or if the amount of data is small enough, the carbondata file contains only one blocklet or page.
> I think a new strategy should be introduced to solve the above problem.
> The new strategy should:
> # be able to control the minimum amount of input data for each node
> # ignore data locality otherwise it may always choose a small portion of particular nodes
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)