Apache CarbonData Dev Mailing List archive

[Discussion] Code generation in carbon result preparation

Classic

List

Threaded

5 messages Options

kumarvishal09

[Discussion] Code generation in carbon result preparation

Hi All,
Currently we are preparing the final result row wise, as number of columns
present in project list(80 columns) is high mainly measure column or no
dictionary column there are lots of cpu cache invalidation is happening and
this is resulting to slower the query performance.

*I can think of two solutions for this problem.*
*Solution 1*. Fill column data vertically, currently it is horizontally(It
may not solve all the problem)
*Solution 2*. Use code generation for result preparation.

This is an initially idea.

-Regards
Kumar Vishal

kumar vishal

Jacky Li

Re: [Discussion] Code generation in carbon result preparation

Hi Vishal,

Which part of the preparation are you considering? The column stitching in the executor side?

Regards,
Jacky

> 在 2016年10月12日，下午9:24，Kumar Vishal <[hidden email]> 写道：
>
> Hi All,
> Currently we are preparing the final result row wise, as number of columns
> present in project list(80 columns) is high mainly measure column or no
> dictionary column there are lots of cpu cache invalidation is happening and
> this is resulting to slower the query performance.
>
> *I can think of two solutions for this problem.*
> *Solution 1*. Fill column data vertically, currently it is horizontally(It
> may not solve all the problem)
> *Solution 2*. Use code generation for result preparation.
>
> This is an initially idea.
>
> -Regards
> Kumar Vishal

kumarvishal09

Re: [Discussion] Code generation in carbon result preparation

Hi Jacky,
Yes result preparation in exeutor side.

-Regards
Kumar Vishal

On Wed, Oct 12, 2016 at 9:33 PM, Jacky Li <[hidden email]> wrote:

> Hi Vishal,
>
> Which part of the preparation are you considering? The column stitching in
> the executor side?
>
> Regards,
> Jacky
>
> > 在 2016年10月12日，下午9:24，Kumar Vishal <[hidden email]> 写道：
> >
> > Hi All,
> > Currently we are preparing the final result row wise, as number of
> columns
> > present in project list(80 columns) is high mainly measure column or no
> > dictionary column there are lots of cpu cache invalidation is happening
> and
> > this is resulting to slower the query performance.
> >
> > *I can think of two solutions for this problem.*
> > *Solution 1*. Fill column data vertically, currently it is
> horizontally(It
> > may not solve all the problem)
> > *Solution 2*. Use code generation for result preparation.
> >
> > This is an initially idea.
> >
> > -Regards
> > Kumar Vishal
>
>
>
>

kumar vishal

Aniket Adnaik

Re: [Discussion] Code generation in carbon result preparation

Hi Vishal,

In general, it is good idea to have a cache efficient algorithm.

For solution-1 : how do you want to handle variable length columns and
nulls? may be you will have to maintain variable length columns separately
and use offsets ?

For solution 2: code generation may be more efficient solution. We should
find out all other places in executor that can benefit from code generation
apart from row formation. BTW, any specific code generation library you
have mind?

Best Regards,
Aniket

On Wed, Oct 12, 2016 at 10:02 AM, Kumar Vishal <[hidden email]>
wrote:

> Hi Jacky,
> Yes result preparation in exeutor side.
>
> -Regards
> Kumar Vishal
>
> On Wed, Oct 12, 2016 at 9:33 PM, Jacky Li <[hidden email]> wrote:
>
> > Hi Vishal,
> >
> > Which part of the preparation are you considering? The column stitching
> in
> > the executor side?
> >
> > Regards,
> > Jacky
> >
> > > 在 2016年10月12日，下午9:24，Kumar Vishal <[hidden email]> 写道：
> > >
> > > Hi All,
> > > Currently we are preparing the final result row wise, as number of
> > columns
> > > present in project list(80 columns) is high mainly measure column or no
> > > dictionary column there are lots of cpu cache invalidation is happening
> > and
> > > this is resulting to slower the query performance.
> > >
> > > *I can think of two solutions for this problem.*
> > > *Solution 1*. Fill column data vertically, currently it is
> > horizontally(It
> > > may not solve all the problem)
> > > *Solution 2*. Use code generation for result preparation.
> > >
> > > This is an initially idea.
> > >
> > > -Regards
> > > Kumar Vishal
> >
> >
> >
> >
>

Vimal Das Kammath

Re: [Discussion] Code generation in carbon result preparation

Hi Vishal,

I think, we need both solution 1 & 2

Solution1 may need re-desiging several parts of Carbon's query process
starting from scanner, aggregator to result preparation. This can help
avoid the frequent cache invalidation.

In Solution2 code generation will not solve the frequent cache invalidation
problem. However, It will surely help to improve the performance by having
specialised code instead of executing generalised code. Especially as we
support several data types and our code is generalised for that. Code
generation will help to improve performance.

Regards
Vimal

On Thu, Oct 13, 2016 at 3:02 AM, Aniket Adnaik <[hidden email]>
wrote:

> Hi Vishal,
>
> In general, it is good idea to have a cache efficient algorithm.
>
> For solution-1 : how do you want to handle variable length columns and
> nulls? may be you will have to maintain variable length columns separately
> and use offsets ?
>
> For solution 2: code generation may be more efficient solution. We should
> find out all other places in executor that can benefit from code generation
> apart from row formation. BTW, any specific code generation library you
> have mind?
>
> Best Regards,
> Aniket
>
> On Wed, Oct 12, 2016 at 10:02 AM, Kumar Vishal <[hidden email]>
> wrote:
>
> > Hi Jacky,
> > Yes result preparation in exeutor side.
> >
> > -Regards
> > Kumar Vishal
> >
> > On Wed, Oct 12, 2016 at 9:33 PM, Jacky Li <[hidden email]> wrote:
> >
> > > Hi Vishal,
> > >
> > > Which part of the preparation are you considering? The column stitching
> > in
> > > the executor side?
> > >
> > > Regards,
> > > Jacky
> > >
> > > > 在 2016年10月12日，下午9:24，Kumar Vishal <[hidden email]> 写道：
> > > >
> > > > Hi All,
> > > > Currently we are preparing the final result row wise, as number of
> > > columns
> > > > present in project list(80 columns) is high mainly measure column or
> no
> > > > dictionary column there are lots of cpu cache invalidation is
> happening
> > > and
> > > > this is resulting to slower the query performance.
> > > >
> > > > *I can think of two solutions for this problem.*
> > > > *Solution 1*. Fill column data vertically, currently it is
> > > horizontally(It
> > > > may not solve all the problem)
> > > > *Solution 2*. Use code generation for result preparation.
> > > >
> > > > This is an initially idea.
> > > >
> > > > -Regards
> > > > Kumar Vishal
> > >
> > >
> > >
> > >
> >
>