Apache CarbonData Dev Mailing List archive - Re: [Discussion] Support pre-aggregate table to improve OLAP performance

Apache CarbonData Dev Mailing List archive

Re: [Discussion] Support pre-aggregate table to improve OLAP performance

Posted by ravipesala on Nov 06, 2017; 12:54pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Support-pre-aggregate-table-to-improve-OLAP-performance-tp24040p25658.html

Hi Bill,

Please find my comments.

1. We are not supporting join queries in this design so it will be always
one parent table for an aggregate table. We may consider the join queries
for creating aggregation queries in future.

2. Aggregation column name will be created internally and it would be line
agg_parentcolumnname.

3. Yes if we create aggtable on dictionary column of parent table then it
uses same parent dictionary. Aggregation table does not generate any
dictionary files.

4. time-series.eventtime is the time column of the main table, there should
be at least one timestamp column on the main table to create
timeseries tables. In design, the granularity is replaced with hierarchy it
means the user can give the time hierarchy like a minute, hour, day so
three aggregation tables of a minute , hour and day aggregation tables will
be created automatically and loaded the data to them for every load.

5. In new design v1.1 it is now changed please check the same.

6. As I mentioned above in new V1.1 design it got changed to hierarchy so
user can define his own time hierarchy.

7. Ok, we will discuss and check whether we can expose this SORT_COLUMNS
configuration on aggregation table. Even if we don't support now we can
expose in future.

8. Yes, merge index s applicable for aggregation table as well.

Regards,
Ravindra.

On 3 November 2017 at 09:05, bill.zhou <[hidden email]> wrote:

> hi Jacky & Ravindra, I have little more query about this design, thank you
> very much can clarify my query.
>
>
> 1. if we support create aggreagation tables from two or more tabels join,
> how to set the aggretate.parent?, whether can be like
> 'aggretate.parent'='fact1,dim1,dim1'
> 2. what's the agg table colum name ? for following create command it will
> be
> as: user_id,name,c2, price ?
> CREATE TABLE agg_sales
> STORED BY 'carbondata'
> TBLPROPERTIES ('aggregate.parent'='sales')
> AS SELECT user_id,user_name as name, sum(quantity) as c2, avg(price) FROM
> sales GROUP BY user_id.
> 3. if we create the dictioanry column in agg table, whether the dictionary
> file will use the same one main table?
>
> 4. for rollup table main table creation: what's the mean for
> timeseries.eventtime, granualarity? what's column can belong to this?
> 5. for rollup table main table creation: what's the mean for
> ‘timeseries.aggtype’ =’quantity:sum, max', it means the column quantity
> only
> support sum, max ?
>
> 6. In both the above cases carbon generates the 4 pre-aggregation tables
> automatically for
> year, month, day and hour. (their table name will be prefixed with
> agg_sales). -- in about cause only see the column hour, how to generate the
> year, month and day ?
>
> 7.In internal implementation, carbon will create these table with
> SORT_COLUMNS=’group by
> column defined above’, so that filter group by query on main table will be
> faster because it
> can leverage the index in pre-aggregate tables. -- I suggstion user can
> control the sort columns order
> 8. whether support merge index to agg table ? -- it is usefull.
>
>
> Jacky Li wrote
> > Hi community,
> >
> > In traditional data warehouse, pre-aggregate table or cube is a common
> > technology to improve OLAP query performance. To take carbondata support
> > for OLAP to next level, I’d like to propose pre-aggregate table support
> in
> > carbondata.
> >
> > Please refer to CARBONDATA-1516
> > <https://issues.apache.org/jira/browse/CARBONDATA-1516> and the
> > design document attached in the JIRA ticket
> > (https://issues.apache.org/jira/browse/CARBONDATA-1516
> > <https://issues.apache.org/jira/browse/CARBONDATA-1516>)
> >
> > This design is still in initial phase, proposed usage and SQL syntax are
> > subject to change. Please provide your comment to improve this feature.
> > Any suggestion on the design from community is welcomed.
> >
> > Regards,
> > Jacky Li
>
>
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>

--
Thanks & Regards,
Ravi