Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Assigned] (CARBONDATA-2309) Add strategy to generate bigger carbondata files in case of small amount of data

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Assigned] (CARBONDATA-2309) Add strategy to generate bigger carbondata files in case of small amount of data

[ https://issues.apache.org/jira/browse/CARBONDATA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

xuchuanyin reassigned CARBONDATA-2309:
--------------------------------------

Assignee: wangsen (was: xuchuanyin)

> Add strategy to generate bigger carbondata files in case of small amount of data
> --------------------------------------------------------------------------------
>
> Key: CARBONDATA-2309
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2309
> Project: CarbonData
> Issue Type: Improvement
> Components: data-load
> Reporter: xuchuanyin
> Assignee: wangsen
> Priority: Major
>
> In some scenario, the input amount of loading data is small, but carbondata still distribute them to each executors (nodes) to do local-sort, thus resulting to small carbondata files generated by each executor.
> In some extreme conditions, if the cluster is big enough or if the amount of data is small enough, the carbondata file contains only one blocklet or page.
> I think a new strategy should be introduced to solve the above problem.
> The new strategy should:
> # be able to control the minimum amount of input data for each node
> # ignore data locality otherwise it may always choose a small portion of particular nodes

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)