Apache CarbonData Dev Mailing List archive

Re: About bucket feature in carbon

Posted by Jacky Li-2 on Feb 09, 2018; 8:14am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/About-bucket-feature-in-carbon-tp39109p39222.html

Hi Ravindra,

You mean we can do one round of refactory for bucketed table feature in CarbonData 1.4.
I am fine with it.

Regards,
Jacky

> 在 2018年2月9日，下午3:49，Ravindra Pesala <[hidden email]> 写道：
>
> Hi Likun,
>
> I feel it is better to change the implementation to use sparks bucketing
> generation just like how standard hive partitions generates. It will be
> easy to change it after implementing of partition feature. And it is a
> useful feature for joining big tables and hash based buckets and clustered
> by enables the queries faster. So it is better to change the
> implementation instead of removing it.
>
> Regards,
> Ravindra.
>
> On 9 February 2018 at 13:14, Jacky Li <[hidden email]> wrote:
>
>> Hi,
>>
>> One year ago, CarbonData 1.0.0 has introduced bucket table feature, it was
>> expected to improve join performance by avoiding shuffling if both tables
>> are bucketed on same column with same number of buckets.
>>
>> However, after this feature was introduced, personally speaking it was not
>> widely used in the community and it creates maintenance overhead for the
>> developers in the community (for very new Pull Request, all bucket related
>> testcase need to be fixed)
>>
>> And now carbon has integrated with spark standard partition, developer can
>> add bucket support using spark bucketed table feature in future if it
>> requires.
>>
>> So, I propose to remove bucket feature after CarbonData 1.3.0 version.
>> What do you think?
>>
>> Regards,
>> Jacky
>>
>>
>
>
> --
> Thanks & Regards,
> Ravi