[
https://issues.apache.org/jira/browse/CARBONDATA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
xuchuanyin reassigned CARBONDATA-2309:
--------------------------------------
Assignee: wangsen (was: xuchuanyin)
> Add strategy to generate bigger carbondata files in case of small amount of data
> --------------------------------------------------------------------------------
>
> Key: CARBONDATA-2309
> URL:
https://issues.apache.org/jira/browse/CARBONDATA-2309> Project: CarbonData
> Issue Type: Improvement
> Components: data-load
> Reporter: xuchuanyin
> Assignee: wangsen
> Priority: Major
>
> In some scenario, the input amount of loading data is small, but carbondata still distribute them to each executors (nodes) to do local-sort, thus resulting to small carbondata files generated by each executor.
> In some extreme conditions, if the cluster is big enough or if the amount of data is small enough, the carbondata file contains only one blocklet or page.
> I think a new strategy should be introduced to solve the above problem.
> The new strategy should:
> # be able to control the minimum amount of input data for each node
> # ignore data locality otherwise it may always choose a small portion of particular nodes
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)