Re: carbondata partitioned by date generate many small files
Posted by
Jacky Li on
Jun 05, 2018; 12:43pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/carbondata-partitioned-by-date-generate-many-small-files-tp51475p51511.html
Hi,
There is a testcase in
StandardPartitionTableQueryTestCase used date column as partition column, if you run that testcase, the partition folder generated looks like following picture.
Are you getting similar folders?
Regards,
Jacky
hi carbondata team,
i am using carbondata 1.3.1 to create table and import data, generated many small files and spark job is very slow, i suspected the number of file is related to the number of spark job . but if i decrease the jobs, job will fail because of outofmemory. see my ddl as below:
create table xx.xx(
dept_name string,
xx
.
.
.
) PARTITIONED BY (xxx date)
STORED BY 'carbondata' TBLPROPERTIES('SORT_COLUMNS'='xxx,xxx,xxx ,xxx,xxx')
please give some advice.
thanks
ChenXingYu