Login  Register

Re: [Discussion] Support pre-aggregate table to improve OLAP performance

Posted by bill.zhou on Nov 16, 2017; 2:59pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Support-pre-aggregate-table-to-improve-OLAP-performance-tp24040p27030.html

hi Ravindra

 as design mention is not support drop segment if create pre-agg table. so
if like this how to support the table data retention ?
Regards


ravipesala wrote

> Hi Bill,
>
> Please find my comments.
>
> 1. We are not supporting join queries in this design so it will be always
> one parent table for an aggregate table. We may consider the join queries
> for creating aggregation queries in future.
>
> 2. Aggregation column name will be created internally and it would be line
> agg_parentcolumnname.
>
> 3. Yes if we create aggtable on dictionary column of parent table then it
> uses same parent dictionary. Aggregation table does not generate any
> dictionary files.
>
> 4. time-series.eventtime is the time column of the main table, there
> should
> be at least one timestamp column on the main table to create
> timeseries tables. In design, the granularity is replaced with hierarchy
> it
> means the user can give the time hierarchy like a minute, hour, day so
> three aggregation tables of a minute , hour and day aggregation tables
> will
> be created automatically and loaded the data to them for every load.
>
> 5. In new design v1.1 it is now changed please check the same.
>
> 6. As I mentioned above in new V1.1 design it got changed to hierarchy so
> user can define his own time hierarchy.
>
> 7. Ok, we will discuss and check whether we can expose this  SORT_COLUMNS
> configuration on aggregation table. Even if we don't support now we can
> expose in future.
>
> 8. Yes, merge index s applicable for aggregation table as well.
>
> Regards,
> Ravindra.
>
> On 3 November 2017 at 09:05, bill.zhou <

> zgcsky08@

> > wrote:
>
>> hi  Jacky & Ravindra, I have little more query about this design, thank
>> you
>> very much can clarify my query.
>>
>>
>> 1. if we support create aggreagation tables from two or more tabels join,
>> how to set the aggretate.parent?, whether can be like
>> 'aggretate.parent'='fact1,dim1,dim1'
>> 2. what's the agg table colum name ? for following create command it will
>> be
>> as: user_id,name,c2, price ?
>> CREATE TABLE agg_sales
>> STORED BY 'carbondata'
>> TBLPROPERTIES ('aggregate.parent'='sales')
>> AS SELECT user_id,user_name as name, sum(quantity) as c2, avg(price) FROM
>> sales GROUP BY user_id.
>> 3. if we create the dictioanry column in agg table, whether the
>> dictionary
>> file will use the same one main table?
>>
>> 4. for rollup table main table creation: what's the mean for
>> timeseries.eventtime, granualarity? what's column can belong to this?
>> 5. for rollup table main table creation: what's the mean for
>> ‘timeseries.aggtype’ =’quantity:sum, max', it means the column quantity
>> only
>> support sum, max ?
>>
>> 6. In both the above cases carbon generates the 4 pre-aggregation tables
>> automatically for
>> year, month, day and hour. (their table name will be prefixed with
>> agg_sales). -- in about cause only see the column hour, how to generate
>> the
>> year, month and day ?
>>
>> 7.In internal implementation, carbon will create these table with
>> SORT_COLUMNS=’group by
>> column defined above’, so that filter group by query on main table will
>> be
>> faster because it
>> can leverage the index in pre-aggregate tables. -- I suggstion user can
>> control the sort columns order
>> 8. whether support merge index to agg table ? -- it is usefull.
>>
>>
>> Jacky Li wrote
>> > Hi community,
>> >
>> > In traditional data warehouse, pre-aggregate table or cube is a common
>> > technology to improve OLAP query performance. To take carbondata
>> support
>> > for OLAP to next level, I’d like to propose pre-aggregate table support
>> in
>> > carbondata.
>> >
>> > Please refer to CARBONDATA-1516
>> > <https://issues.apache.org/jira/browse/CARBONDATA-1516> and the
>> > design document attached in the JIRA ticket
>> > (https://issues.apache.org/jira/browse/CARBONDATA-1516
>> > <https://issues.apache.org/jira/browse/CARBONDATA-1516>)
>> >
>> > This design is still in initial phase, proposed usage and SQL syntax
>> are
>> > subject to change. Please provide your comment to improve this feature.
>> > Any suggestion on the design from community is welcomed.
>> >
>> > Regards,
>> > Jacky Li
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
>> n5.nabble.com/
>>
>
>
>
> --
> Thanks & Regards,
> Ravi





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/