[Discuss]Set block_size for table on table level

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discuss]Set block_size for table on table level

Zhangshunyu
Purpose:
To configure block file size for each table on column level, so that each table could has its own blocksize.
My solution:
Add a new parameter in table properties, when create a table, the user can set it in ddl. Add a parameter in thrift format just like other properties, and write this info into thrift file so that this info would not lost when cluster is restarted.

What's your opinion?
My English name is Sunday
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss]Set block_size for table on table level

ravipesala
+1
At same time max and min block size should be restricted and validated
while creating table.

On 26 September 2016 at 07:36, Zhangshunyu <[hidden email]> wrote:

> Purpose:
> To configure block file size for each table on column level, so that each
> table could has its own blocksize.
> My solution:
> Add a new parameter in table properties, when create a table, the user can
> set it in ddl. Add a parameter in thrift format just like other properties,
> and write this info into thrift file so that this info would not lost when
> cluster is restarted.
>
> What's your opinion?
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-
> block-size-for-table-on-table-level-tp1472.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>



--
Thanks & Regards,
Ravi
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss]Set block_size for table on table level

Jacky Li
I am OK with this feature, the only thing I am worrying about is the compatibility of CarbonData file reader. Can you make it compatible when you reading old CarbonData file without this property.
We have encountered many times that user need to delete the store and re-load the data.

Regards,
Jacky

> 在 2016年9月26日,下午2:15,Ravindra Pesala <[hidden email]> 写道:
>
> +1
> At same time max and min block size should be restricted and validated
> while creating table.
>
> On 26 September 2016 at 07:36, Zhangshunyu <[hidden email]> wrote:
>
>> Purpose:
>> To configure block file size for each table on column level, so that each
>> table could has its own blocksize.
>> My solution:
>> Add a new parameter in table properties, when create a table, the user can
>> set it in ddl. Add a parameter in thrift format just like other properties,
>> and write this info into thrift file so that this info would not lost when
>> cluster is restarted.
>>
>> What's your opinion?
>>
>>
>>
>> --
>> View this message in context: http://apache-carbondata-
>> mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-
>> block-size-for-table-on-table-level-tp1472.html
>> Sent from the Apache CarbonData Mailing List archive mailing list archive
>> at Nabble.com.
>>
>
>
> --
> Thanks & Regards,
> Ravi



Reply | Threaded
Open this post in threaded view
|

RE: [Discuss]Set block_size for table on table level

Jihong Ma
+1, To avoid potential compatibility issue, we could introduce this param as an optional field, as long as it is not a required field, we are fine with a defined default block size.

Regards.

Jihong

-----Original Message-----
From: Jacky Li [mailto:[hidden email]]
Sent: Monday, September 26, 2016 7:29 AM
To: [hidden email]
Subject: Re: [Discuss]Set block_size for table on table level

I am OK with this feature, the only thing I am worrying about is the compatibility of CarbonData file reader. Can you make it compatible when you reading old CarbonData file without this property.
We have encountered many times that user need to delete the store and re-load the data.

Regards,
Jacky

> 在 2016年9月26日,下午2:15,Ravindra Pesala <[hidden email]> 写道:
>
> +1
> At same time max and min block size should be restricted and validated
> while creating table.
>
> On 26 September 2016 at 07:36, Zhangshunyu <[hidden email]> wrote:
>
>> Purpose:
>> To configure block file size for each table on column level, so that each
>> table could has its own blocksize.
>> My solution:
>> Add a new parameter in table properties, when create a table, the user can
>> set it in ddl. Add a parameter in thrift format just like other properties,
>> and write this info into thrift file so that this info would not lost when
>> cluster is restarted.
>>
>> What's your opinion?
>>
>>
>>
>> --
>> View this message in context: http://apache-carbondata-
>> mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-
>> block-size-for-table-on-table-level-tp1472.html
>> Sent from the Apache CarbonData Mailing List archive mailing list archive
>> at Nabble.com.
>>
>
>
> --
> Thanks & Regards,
> Ravi



Reply | Threaded
Open this post in threaded view
|

Re: [Discuss]Set block_size for table on table level

Venkata Gollamudi
+1 agree with others comments

On Tue, Sep 27, 2016, 12:16 AM Jihong Ma <[hidden email]> wrote:

> +1, To avoid potential compatibility issue, we could introduce this param
> as an optional field, as long as it is not a required field, we are fine
> with a defined default block size.
>
> Regards.
>
> Jihong
>
> -----Original Message-----
> From: Jacky Li [mailto:[hidden email]]
> Sent: Monday, September 26, 2016 7:29 AM
> To: [hidden email]
> Subject: Re: [Discuss]Set block_size for table on table level
>
> I am OK with this feature, the only thing I am worrying about is the
> compatibility of CarbonData file reader. Can you make it compatible when
> you reading old CarbonData file without this property.
> We have encountered many times that user need to delete the store and
> re-load the data.
>
> Regards,
> Jacky
>
> > 在 2016年9月26日,下午2:15,Ravindra Pesala <[hidden email]> 写道:
> >
> > +1
> > At same time max and min block size should be restricted and validated
> > while creating table.
> >
> > On 26 September 2016 at 07:36, Zhangshunyu <[hidden email]>
> wrote:
> >
> >> Purpose:
> >> To configure block file size for each table on column level, so that
> each
> >> table could has its own blocksize.
> >> My solution:
> >> Add a new parameter in table properties, when create a table, the user
> can
> >> set it in ddl. Add a parameter in thrift format just like other
> properties,
> >> and write this info into thrift file so that this info would not lost
> when
> >> cluster is restarted.
> >>
> >> What's your opinion?
> >>
> >>
> >>
> >> --
> >> View this message in context: http://apache-carbondata-
> >> mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-
> >> block-size-for-table-on-table-level-tp1472.html
> >> Sent from the Apache CarbonData Mailing List archive mailing list
> archive
> >> at Nabble.com.
> >>
> >
> >
> > --
> > Thanks & Regards,
> > Ravi
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss]Set block_size for table on table level

Jean-Baptiste Onofré
In reply to this post by Jacky Li
+1

Regards
JB

On 09/26/2016 04:29 PM, Jacky Li wrote:

> I am OK with this feature, the only thing I am worrying about is the compatibility of CarbonData file reader. Can you make it compatible when you reading old CarbonData file without this property.
> We have encountered many times that user need to delete the store and re-load the data.
>
> Regards,
> Jacky
>
>> 在 2016年9月26日,下午2:15,Ravindra Pesala <[hidden email]> 写道:
>>
>> +1
>> At same time max and min block size should be restricted and validated
>> while creating table.
>>
>> On 26 September 2016 at 07:36, Zhangshunyu <[hidden email]> wrote:
>>
>>> Purpose:
>>> To configure block file size for each table on column level, so that each
>>> table could has its own blocksize.
>>> My solution:
>>> Add a new parameter in table properties, when create a table, the user can
>>> set it in ddl. Add a parameter in thrift format just like other properties,
>>> and write this info into thrift file so that this info would not lost when
>>> cluster is restarted.
>>>
>>> What's your opinion?
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-carbondata-
>>> mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-
>>> block-size-for-table-on-table-level-tp1472.html
>>> Sent from the Apache CarbonData Mailing List archive mailing list archive
>>> at Nabble.com.
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Ravi
>
>
>

--
Jean-Baptiste Onofré
[hidden email]
http://blog.nanthrax.net
Talend - http://www.talend.com
Reply | Threaded
Open this post in threaded view
|

RE: [Discuss]Set block_size for table on table level

Liang Chen
Administrator
In reply to this post by Jihong Ma
+1, agree with Jihong's comment : make it as optional, usually the default block size will be used if user don't specially define it.

Regards
Liang

Jihong Ma wrote
+1, To avoid potential compatibility issue, we could introduce this param as an optional field, as long as it is not a required field, we are fine with a defined default block size.

Regards.

Jihong

-----Original Message-----
From: Jacky Li [mailto:[hidden email]]
Sent: Monday, September 26, 2016 7:29 AM
To: [hidden email]
Subject: Re: [Discuss]Set block_size for table on table level

I am OK with this feature, the only thing I am worrying about is the compatibility of CarbonData file reader. Can you make it compatible when you reading old CarbonData file without this property.
We have encountered many times that user need to delete the store and re-load the data.

Regards,
Jacky

> 在 2016年9月26日,下午2:15,Ravindra Pesala <[hidden email]> 写道:
>
> +1
> At same time max and min block size should be restricted and validated
> while creating table.
>
> On 26 September 2016 at 07:36, Zhangshunyu <[hidden email]> wrote:
>
>> Purpose:
>> To configure block file size for each table on column level, so that each
>> table could has its own blocksize.
>> My solution:
>> Add a new parameter in table properties, when create a table, the user can
>> set it in ddl. Add a parameter in thrift format just like other properties,
>> and write this info into thrift file so that this info would not lost when
>> cluster is restarted.
>>
>> What's your opinion?
>>
>>
>>
>> --
>> View this message in context: http://apache-carbondata-
>> mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-
>> block-size-for-table-on-table-level-tp1472.html
>> Sent from the Apache CarbonData Mailing List archive mailing list archive
>> at Nabble.com.
>>
>
>
> --
> Thanks & Regards,
> Ravi
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss]Set block_size for table on table level

金铸
+1,agree with jihong.


在 2016/9/27 5:12, chenliang613 写道:

> +1, agree with Jihong's comment : make it as optional, usually the default
> block size will be used if user don't specially define it.
>
> Regards
> Liang
>
>
> Jihong Ma wrote
>> +1, To avoid potential compatibility issue, we could introduce this param
>> as an optional field, as long as it is not a required field, we are fine
>> with a defined default block size.
>>
>> Regards.
>>
>> Jihong
>>
>> -----Original Message-----
>> From: Jacky Li [mailto:
>> jacky.likun@
>> ]
>> Sent: Monday, September 26, 2016 7:29 AM
>> To:
>> dev@.apache
>> Subject: Re: [Discuss]Set block_size for table on table level
>>
>> I am OK with this feature, the only thing I am worrying about is the
>> compatibility of CarbonData file reader. Can you make it compatible when
>> you reading old CarbonData file without this property.
>> We have encountered many times that user need to delete the store and
>> re-load the data.
>>
>> Regards,
>> Jacky
>>
>>> 在 2016年9月26日,下午2:15,Ravindra Pesala &lt;
>> ravi.pesala@
>> &gt; 写道:
>>> +1
>>> At same time max and min block size should be restricted and validated
>>> while creating table.
>>>
>>> On 26 September 2016 at 07:36, Zhangshunyu &lt;
>> zhangshunyu1990@
>> &gt; wrote:
>>>> Purpose:
>>>> To configure block file size for each table on column level, so that
>>>> each
>>>> table could has its own blocksize.
>>>> My solution:
>>>> Add a new parameter in table properties, when create a table, the user
>>>> can
>>>> set it in ddl. Add a parameter in thrift format just like other
>>>> properties,
>>>> and write this info into thrift file so that this info would not lost
>>>> when
>>>> cluster is restarted.
>>>>
>>>> What's your opinion?
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-carbondata-
>>>> mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-
>>>> block-size-for-table-on-table-level-tp1472.html
>>>> Sent from the Apache CarbonData Mailing List archive mailing list
>>>> archive
>>>> at Nabble.com.
>>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Ravi
>
>
>
>
> --
>
>
>  


---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this communication in error,please
immediately notify the sender by return e-mail, and delete the original message and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss]Set block_size for table on table level

Zhangshunyu
OK, thanks for your kindly reply.
My English name is Sunday
Reply | Threaded
Open this post in threaded view
|

回复:[Discuss]Set block_size for table on table level

Eason
In reply to this post by Liang Chen
+1, agree with jihong.------------------------------------------------------------------发件人:金铸 <[hidden email]>发送时间:2016年9月27日(星期二) 08:22收件人:dev <[hidden email]>主 题:Re: [Discuss]Set block_size for table on table level
+1,agree with jihong.


在 2016/9/27 5:12, chenliang613 写道:

> +1, agree with Jihong's comment : make it as optional, usually the default
> block size will be used if user don't specially define it.
>
> Regards
> Liang
>
>
> Jihong Ma wrote
>> +1, To avoid potential compatibility issue, we could introduce this param
>> as an optional field, as long as it is not a required field, we are fine
>> with a defined default block size.
>>
>> Regards.
>>
>> Jihong
>>
>> -----Original Message-----
>> From: Jacky Li [mailto:
>> jacky.likun@
>> ]
>> Sent: Monday, September 26, 2016 7:29 AM
>> To:
>> dev@.apache
>> Subject: Re: [Discuss]Set block_size for table on table level
>>
>> I am OK with this feature, the only thing I am worrying about is the
>> compatibility of CarbonData file reader. Can you make it compatible when
>> you reading old CarbonData file without this property.
>> We have encountered many times that user need to delete the store and
>> re-load the data.
>>
>> Regards,
>> Jacky
>>
>>> 在 2016年9月26日,下午2:15,Ravindra Pesala &lt;
>> ravi.pesala@
>> &gt; 写道:
>>> +1
>>> At same time max and min block size should be restricted and validated
>>> while creating table.
>>>
>>> On 26 September 2016 at 07:36, Zhangshunyu &lt;
>> zhangshunyu1990@
>> &gt; wrote:
>>>> Purpose:
>>>> To configure block file size for each table on column level, so that
>>>> each
>>>> table could has its own blocksize.
>>>> My solution:
>>>> Add a new parameter in table properties, when create a table, the user
>>>> can
>>>> set it in ddl. Add a parameter in thrift format just like other
>>>> properties,
>>>> and write this info into thrift file so that this info would not lost
>>>> when
>>>> cluster is restarted.
>>>>
>>>> What's your opinion?
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-carbondata-
>>>> mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-
>>>> block-size-for-table-on-table-level-tp1472.html
>>>> Sent from the Apache CarbonData Mailing List archive mailing list
>>>> archive
>>>> at Nabble.com.
>>>>
>>>
>>> -- 
>>> Thanks & Regards,
>>> Ravi
>
>
>
>
> --
>
>
>   


---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this communication in error,please
immediately notify the sender by return e-mail, and delete the original message and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: 回复:[Discuss]Set block_size for table on table level

Zhangshunyu
I have verified that it would not affect the older tables.
My English name is Sunday
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss]Set block_size for table on table level

Raghunandan subramanya
In reply to this post by Zhangshunyu
What problems we want to solve by making it configurable?



On 2016-09-26 07:36 (+0530), Zhangshunyu <[hidden email]> wrote:

> Purpose:>

> To configure block file size for each table on column level, so that each>

> table could has its own blocksize.>

> My solution:>

> Add a new parameter in table properties, when create a table, the user can>

> set it in ddl. Add a parameter in thrift format just like other properties,>

> and write this info into thrift file so that this info would not lost when>

> cluster is restarted.>

>

> What's your opinion?>

>

>

>

> -->

> View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-block-size-for-table-on-table-level-tp1472.html>

> Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.>

>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss]Set block_size for table on table level

Zhangshunyu
For each table, we can set block size consider the data.size, this is because that when execute query, each task will get one block to process one time, when the blocks num <  parallelism, set a reasonable block size would get most suitable block num, to make the best of parallelism.
My English name is Sunday
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss]Set block_size for table on table level

Raghunandan subramanya
In reply to this post by Zhangshunyu
Would blocklet division and distribution not help to solve this scenario?



On 2016-09-28 12:36 (+0530), Zhangshunyu <[hidden email]> wrote:

> For each table, we can set block size consider the data.size, this is because>

> that when execute query, each task will get one block to process one time,>

> when the blocks num <  parallelism, set a reasonable block size would get>

> most suitable block num, to make the best of parallelism.>

>

>

>

>

> -->

> View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-block-size-for-table-on-table-level-tp1472p1538.html>

> Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.>

>