Purpose:
To configure block file size for each table on column level, so that each table could has its own blocksize. My solution: Add a new parameter in table properties, when create a table, the user can set it in ddl. Add a parameter in thrift format just like other properties, and write this info into thrift file so that this info would not lost when cluster is restarted. What's your opinion?
My English name is Sunday
|
+1
At same time max and min block size should be restricted and validated while creating table. On 26 September 2016 at 07:36, Zhangshunyu <[hidden email]> wrote: > Purpose: > To configure block file size for each table on column level, so that each > table could has its own blocksize. > My solution: > Add a new parameter in table properties, when create a table, the user can > set it in ddl. Add a parameter in thrift format just like other properties, > and write this info into thrift file so that this info would not lost when > cluster is restarted. > > What's your opinion? > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/Discuss-Set- > block-size-for-table-on-table-level-tp1472.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. > -- Thanks & Regards, Ravi |
I am OK with this feature, the only thing I am worrying about is the compatibility of CarbonData file reader. Can you make it compatible when you reading old CarbonData file without this property.
We have encountered many times that user need to delete the store and re-load the data. Regards, Jacky > 在 2016年9月26日,下午2:15,Ravindra Pesala <[hidden email]> 写道: > > +1 > At same time max and min block size should be restricted and validated > while creating table. > > On 26 September 2016 at 07:36, Zhangshunyu <[hidden email]> wrote: > >> Purpose: >> To configure block file size for each table on column level, so that each >> table could has its own blocksize. >> My solution: >> Add a new parameter in table properties, when create a table, the user can >> set it in ddl. Add a parameter in thrift format just like other properties, >> and write this info into thrift file so that this info would not lost when >> cluster is restarted. >> >> What's your opinion? >> >> >> >> -- >> View this message in context: http://apache-carbondata- >> mailing-list-archive.1130556.n5.nabble.com/Discuss-Set- >> block-size-for-table-on-table-level-tp1472.html >> Sent from the Apache CarbonData Mailing List archive mailing list archive >> at Nabble.com. >> > > > -- > Thanks & Regards, > Ravi |
+1, To avoid potential compatibility issue, we could introduce this param as an optional field, as long as it is not a required field, we are fine with a defined default block size.
Regards. Jihong -----Original Message----- From: Jacky Li [mailto:[hidden email]] Sent: Monday, September 26, 2016 7:29 AM To: [hidden email] Subject: Re: [Discuss]Set block_size for table on table level I am OK with this feature, the only thing I am worrying about is the compatibility of CarbonData file reader. Can you make it compatible when you reading old CarbonData file without this property. We have encountered many times that user need to delete the store and re-load the data. Regards, Jacky > 在 2016年9月26日,下午2:15,Ravindra Pesala <[hidden email]> 写道: > > +1 > At same time max and min block size should be restricted and validated > while creating table. > > On 26 September 2016 at 07:36, Zhangshunyu <[hidden email]> wrote: > >> Purpose: >> To configure block file size for each table on column level, so that each >> table could has its own blocksize. >> My solution: >> Add a new parameter in table properties, when create a table, the user can >> set it in ddl. Add a parameter in thrift format just like other properties, >> and write this info into thrift file so that this info would not lost when >> cluster is restarted. >> >> What's your opinion? >> >> >> >> -- >> View this message in context: http://apache-carbondata- >> mailing-list-archive.1130556.n5.nabble.com/Discuss-Set- >> block-size-for-table-on-table-level-tp1472.html >> Sent from the Apache CarbonData Mailing List archive mailing list archive >> at Nabble.com. >> > > > -- > Thanks & Regards, > Ravi |
+1 agree with others comments
On Tue, Sep 27, 2016, 12:16 AM Jihong Ma <[hidden email]> wrote: > +1, To avoid potential compatibility issue, we could introduce this param > as an optional field, as long as it is not a required field, we are fine > with a defined default block size. > > Regards. > > Jihong > > -----Original Message----- > From: Jacky Li [mailto:[hidden email]] > Sent: Monday, September 26, 2016 7:29 AM > To: [hidden email] > Subject: Re: [Discuss]Set block_size for table on table level > > I am OK with this feature, the only thing I am worrying about is the > compatibility of CarbonData file reader. Can you make it compatible when > you reading old CarbonData file without this property. > We have encountered many times that user need to delete the store and > re-load the data. > > Regards, > Jacky > > > 在 2016年9月26日,下午2:15,Ravindra Pesala <[hidden email]> 写道: > > > > +1 > > At same time max and min block size should be restricted and validated > > while creating table. > > > > On 26 September 2016 at 07:36, Zhangshunyu <[hidden email]> > wrote: > > > >> Purpose: > >> To configure block file size for each table on column level, so that > each > >> table could has its own blocksize. > >> My solution: > >> Add a new parameter in table properties, when create a table, the user > can > >> set it in ddl. Add a parameter in thrift format just like other > properties, > >> and write this info into thrift file so that this info would not lost > when > >> cluster is restarted. > >> > >> What's your opinion? > >> > >> > >> > >> -- > >> View this message in context: http://apache-carbondata- > >> mailing-list-archive.1130556.n5.nabble.com/Discuss-Set- > >> block-size-for-table-on-table-level-tp1472.html > >> Sent from the Apache CarbonData Mailing List archive mailing list > archive > >> at Nabble.com. > >> > > > > > > -- > > Thanks & Regards, > > Ravi > > > > |
In reply to this post by Jacky Li
+1
Regards JB On 09/26/2016 04:29 PM, Jacky Li wrote: > I am OK with this feature, the only thing I am worrying about is the compatibility of CarbonData file reader. Can you make it compatible when you reading old CarbonData file without this property. > We have encountered many times that user need to delete the store and re-load the data. > > Regards, > Jacky > >> 在 2016年9月26日,下午2:15,Ravindra Pesala <[hidden email]> 写道: >> >> +1 >> At same time max and min block size should be restricted and validated >> while creating table. >> >> On 26 September 2016 at 07:36, Zhangshunyu <[hidden email]> wrote: >> >>> Purpose: >>> To configure block file size for each table on column level, so that each >>> table could has its own blocksize. >>> My solution: >>> Add a new parameter in table properties, when create a table, the user can >>> set it in ddl. Add a parameter in thrift format just like other properties, >>> and write this info into thrift file so that this info would not lost when >>> cluster is restarted. >>> >>> What's your opinion? >>> >>> >>> >>> -- >>> View this message in context: http://apache-carbondata- >>> mailing-list-archive.1130556.n5.nabble.com/Discuss-Set- >>> block-size-for-table-on-table-level-tp1472.html >>> Sent from the Apache CarbonData Mailing List archive mailing list archive >>> at Nabble.com. >>> >> >> >> -- >> Thanks & Regards, >> Ravi > > > -- Jean-Baptiste Onofré [hidden email] http://blog.nanthrax.net Talend - http://www.talend.com |
Administrator
|
In reply to this post by Jihong Ma
+1, agree with Jihong's comment : make it as optional, usually the default block size will be used if user don't specially define it.
Regards Liang
|
+1,agree with jihong.
在 2016/9/27 5:12, chenliang613 写道: > +1, agree with Jihong's comment : make it as optional, usually the default > block size will be used if user don't specially define it. > > Regards > Liang > > > Jihong Ma wrote >> +1, To avoid potential compatibility issue, we could introduce this param >> as an optional field, as long as it is not a required field, we are fine >> with a defined default block size. >> >> Regards. >> >> Jihong >> >> -----Original Message----- >> From: Jacky Li [mailto: >> jacky.likun@ >> ] >> Sent: Monday, September 26, 2016 7:29 AM >> To: >> dev@.apache >> Subject: Re: [Discuss]Set block_size for table on table level >> >> I am OK with this feature, the only thing I am worrying about is the >> compatibility of CarbonData file reader. Can you make it compatible when >> you reading old CarbonData file without this property. >> We have encountered many times that user need to delete the store and >> re-load the data. >> >> Regards, >> Jacky >> >>> 在 2016年9月26日,下午2:15,Ravindra Pesala < >> ravi.pesala@ >> > 写道: >>> +1 >>> At same time max and min block size should be restricted and validated >>> while creating table. >>> >>> On 26 September 2016 at 07:36, Zhangshunyu < >> zhangshunyu1990@ >> > wrote: >>>> Purpose: >>>> To configure block file size for each table on column level, so that >>>> each >>>> table could has its own blocksize. >>>> My solution: >>>> Add a new parameter in table properties, when create a table, the user >>>> can >>>> set it in ddl. Add a parameter in thrift format just like other >>>> properties, >>>> and write this info into thrift file so that this info would not lost >>>> when >>>> cluster is restarted. >>>> >>>> What's your opinion? >>>> >>>> >>>> >>>> -- >>>> View this message in context: http://apache-carbondata- >>>> mailing-list-archive.1130556.n5.nabble.com/Discuss-Set- >>>> block-size-for-table-on-table-level-tp1472.html >>>> Sent from the Apache CarbonData Mailing List archive mailing list >>>> archive >>>> at Nabble.com. >>>> >>> >>> -- >>> Thanks & Regards, >>> Ravi > > > > > -- > > > --------------------------------------------------------------------------------------------------- Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) is intended only for the use of the intended recipient and may be confidential and/or privileged of Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying is strictly prohibited, and may be unlawful.If you have received this communication in error,please immediately notify the sender by return e-mail, and delete the original message and all copies from your system. Thank you. --------------------------------------------------------------------------------------------------- |
OK, thanks for your kindly reply.
My English name is Sunday
|
In reply to this post by Liang Chen
+1, agree with jihong.------------------------------------------------------------------发件人:金铸 <[hidden email]>发送时间:2016年9月27日(星期二) 08:22收件人:dev <[hidden email]>主 题:Re: [Discuss]Set block_size for table on table level
+1,agree with jihong. 在 2016/9/27 5:12, chenliang613 写道: > +1, agree with Jihong's comment : make it as optional, usually the default > block size will be used if user don't specially define it. > > Regards > Liang > > > Jihong Ma wrote >> +1, To avoid potential compatibility issue, we could introduce this param >> as an optional field, as long as it is not a required field, we are fine >> with a defined default block size. >> >> Regards. >> >> Jihong >> >> -----Original Message----- >> From: Jacky Li [mailto: >> jacky.likun@ >> ] >> Sent: Monday, September 26, 2016 7:29 AM >> To: >> dev@.apache >> Subject: Re: [Discuss]Set block_size for table on table level >> >> I am OK with this feature, the only thing I am worrying about is the >> compatibility of CarbonData file reader. Can you make it compatible when >> you reading old CarbonData file without this property. >> We have encountered many times that user need to delete the store and >> re-load the data. >> >> Regards, >> Jacky >> >>> 在 2016年9月26日,下午2:15,Ravindra Pesala < >> ravi.pesala@ >> > 写道: >>> +1 >>> At same time max and min block size should be restricted and validated >>> while creating table. >>> >>> On 26 September 2016 at 07:36, Zhangshunyu < >> zhangshunyu1990@ >> > wrote: >>>> Purpose: >>>> To configure block file size for each table on column level, so that >>>> each >>>> table could has its own blocksize. >>>> My solution: >>>> Add a new parameter in table properties, when create a table, the user >>>> can >>>> set it in ddl. Add a parameter in thrift format just like other >>>> properties, >>>> and write this info into thrift file so that this info would not lost >>>> when >>>> cluster is restarted. >>>> >>>> What's your opinion? >>>> >>>> >>>> >>>> -- >>>> View this message in context: http://apache-carbondata- >>>> mailing-list-archive.1130556.n5.nabble.com/Discuss-Set- >>>> block-size-for-table-on-table-level-tp1472.html >>>> Sent from the Apache CarbonData Mailing List archive mailing list >>>> archive >>>> at Nabble.com. >>>> >>> >>> -- >>> Thanks & Regards, >>> Ravi > > > > > -- > > > --------------------------------------------------------------------------------------------------- Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) is intended only for the use of the intended recipient and may be confidential and/or privileged of Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying is strictly prohibited, and may be unlawful.If you have received this communication in error,please immediately notify the sender by return e-mail, and delete the original message and all copies from your system. Thank you. --------------------------------------------------------------------------------------------------- |
I have verified that it would not affect the older tables.
My English name is Sunday
|
In reply to this post by Zhangshunyu
What problems we want to solve by making it configurable?
On 2016-09-26 07:36 (+0530), Zhangshunyu <[hidden email]> wrote: > Purpose:> > To configure block file size for each table on column level, so that each> > table could has its own blocksize.> > My solution:> > Add a new parameter in table properties, when create a table, the user can> > set it in ddl. Add a parameter in thrift format just like other properties,> > and write this info into thrift file so that this info would not lost when> > cluster is restarted.> > > What's your opinion?> > > > > --> > View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-block-size-for-table-on-table-level-tp1472.html> > Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.> > |
For each table, we can set block size consider the data.size, this is because that when execute query, each task will get one block to process one time, when the blocks num < parallelism, set a reasonable block size would get most suitable block num, to make the best of parallelism.
My English name is Sunday
|
In reply to this post by Zhangshunyu
Would blocklet division and distribution not help to solve this scenario?
On 2016-09-28 12:36 (+0530), Zhangshunyu <[hidden email]> wrote: > For each table, we can set block size consider the data.size, this is because> > that when execute query, each task will get one block to process one time,> > when the blocks num < parallelism, set a reasonable block size would get> > most suitable block num, to make the best of parallelism.> > > > > > --> > View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-block-size-for-table-on-table-level-tp1472p1538.html> > Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.> > |
Free forum by Nabble | Edit this page |