Apache CarbonData Dev Mailing List archive

[DISCUSSION] Initiating Apache CarbonData-1.1.0 incubating Release

Classic

List

Threaded

2 messages Options

ravipesala

[DISCUSSION] Initiating Apache CarbonData-1.1.0 incubating Release

Hi All,

As planned we are going to release Apache CarbonData-1.1.0. Please discuss
and vote for it to initiate 1.1.0 release, i will start to prepare the
release after 3-days of discussion. It will have following features.

1. Introduced new data format called V3(version 3).

Improves the sequential IO by keeping larger size blocklets.So read
larger data at once to memory.
Introduced pages with size of 32000 each for every column inside
blocklet. And min/max is maintained for each page to improve the filter
queries.
Improved compression/decompression of row pages.
Our all performance is improved by 50% compare to old format as per TPC-H
benchmark results.

2. Alter table support in carbondata. (Only for Spark 2.1)

Support renaming of existing table.
Support adding of new column.
Support removing of new column.
Support Upcasting(Ex: from smallint to int) of datatype

3. Supported Batch Sort to improve dataloading performance.

It makes sort step as non blocking step and capable of sorting whole
batch in memory and converts to carbondata file.

4. Improved Single pass load by upgrading to latest netty framework and
launched dictionary client for each loading

5. Supported range filters to combine the between filters to one filter to
improve the filter performance.

6. Apart from features many bugs and improvements are done in this release.

--
Thanks & Regards,
Ravindra

Liang Chen

Re: [DISCUSSION] Initiating Apache CarbonData-1.1.0 incubating Release

Administrator

Hi

+1 for starting to prepare new release 1.1
Great progress, new file format V3 would significantly improve performance.

Regards
Liang

2017-03-26 10:46 GMT+05:30 Ravindra Pesala <[hidden email]>:

> Hi All,
>
> As planned we are going to release Apache CarbonData-1.1.0. Please discuss
> and vote for it to initiate 1.1.0 release, i will start to prepare the
> release after 3-days of discussion. It will have following features.
>
> 1. Introduced new data format called V3(version 3).
>
> Improves the sequential IO by keeping larger size blocklets.So read
> larger data at once to memory.
> Introduced pages with size of 32000 each for every column inside
> blocklet. And min/max is maintained for each page to improve the filter
> queries.
> Improved compression/decompression of row pages.
> Our all performance is improved by 50% compare to old format as per TPC-H
> benchmark results.
>
>
> 2. Alter table support in carbondata. (Only for Spark 2.1)
>
> Support renaming of existing table.
> Support adding of new column.
> Support removing of new column.
> Support Upcasting(Ex: from smallint to int) of datatype
>
>
> 3. Supported Batch Sort to improve dataloading performance.
>
> It makes sort step as non blocking step and capable of sorting whole
> batch in memory and converts to carbondata file.
>
>
> 4. Improved Single pass load by upgrading to latest netty framework and
> launched dictionary client for each loading
>
> 5. Supported range filters to combine the between filters to one filter to
> improve the filter performance.
>
> 6. Apart from features many bugs and improvements are done in this release.
>
> --
> Thanks & Regards,
> Ravindra
>

--
Regards
Liang