[SUGGESTION]Support Decoder based fallback mechanism in local dictionary

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[SUGGESTION]Support Decoder based fallback mechanism in local dictionary

akashrn5
Hi all,

Currently, when the fallback is initiated for a column page in case of
local dictionary, we are keeping both encoded data
and actual data in memory and then we form the new column page without
dictionary encoding and then at last we free the Encoded Column Page.
Because of this offheap memory footprint increases.

We can reduce the offheap memory footprint. This can be done using decoder
based fallback mechanism.
This means, no need to keep the actual data along with encoded data in
encoded column page. We can keep only encoded data and to form a new column
page, get the dictionary data from encoded column page by uncompressing and
using dictionary data get the actual data using local dictionary generator
and put it in new column page created and compress it again and give to
consumer for writing blocklet.

The above process may slow down the loading, but it will reduces the memory
footprint. So we can give a property which will decide whether to take
current fallback procedure or decoder based fallback mechanism dring
fallback.
Any inputs or suggestions are welcomed.


Regards,
Akash
Reply | Threaded
Open this post in threaded view
|

Re: [SUGGESTION]Support Decoder based fallback mechanism in local dictionary

xuchuanyin
This means, no need to keep the actual data along with encoded data in
encoded column page.
---
A problem is that, currently index datamap needs the actual data to generate
index. You may affect this procedure if you do not keep the actual data.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [SUGGESTION]Support Decoder based fallback mechanism in local dictionary

kumarvishal09
+1
@ xuchuanyin
This will not impact data map writing flow as actual column page will be
cleared only after consuming all the records by data map writer,
there will not be any change in that area.

-Regards
Kumar Vishal
,

On Mon, Aug 27, 2018 at 1:01 PM xuchuanyin <[hidden email]> wrote:

> This means, no need to keep the actual data along with encoded data in
> encoded column page.
> ---
> A problem is that, currently index datamap needs the actual data to
> generate
> index. You may affect this procedure if you do not keep the actual data.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
kumar vishal
Reply | Threaded
Open this post in threaded view
|

Re: [SUGGESTION]Support Decoder based fallback mechanism in local dictionary

manishgupta88
+1
@Akash..I suggest not to expose any property to the user for this. The
design should support this decision based on the property but to expose it
to the end user, this decision can be taken once you complete your
performance testing.

Regards
Manish Gupta

On Mon, 27 Aug 2018 at 1:57 PM, Kumar Vishal <[hidden email]>
wrote:

> +1
> @ xuchuanyin
> This will not impact data map writing flow as actual column page will be
> cleared only after consuming all the records by data map writer,
> there will not be any change in that area.
>
> -Regards
> Kumar Vishal
> ,
>
> On Mon, Aug 27, 2018 at 1:01 PM xuchuanyin <[hidden email]> wrote:
>
> > This means, no need to keep the actual data along with encoded data in
> > encoded column page.
> > ---
> > A problem is that, currently index datamap needs the actual data to
> > generate
> > index. You may affect this procedure if you do not keep the actual data.
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [SUGGESTION]Support Decoder based fallback mechanism in local dictionary

akashrn5
As of now i will code as user property, and we can take desicion once we get
the performance report with this.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [SUGGESTION]Support Decoder based fallback mechanism in local dictionary

akashrn5
In reply to this post by akashrn5
Hi all,

With PR https://github.com/apache/carbondata/pull/2662
i have tested the performance and memory requirement with decoder based
fallback for local dictionary and the results are as below

1. with current implementation, data loading of 3million data was taking
around 4GB when local dictionary was enabled which is almost 10times the
memory required to load same data when local dictionary is disabled.
  With decoder based fall back, the memory requirement is reduced from
10times to almost 2 times.


2. The dataloading performance is as below.
With the current implementation, the data loading of 1 billlion data takes
around 1.1hrs
and with decoder based fallback it takes 1.2hrs, which is not much
difference, but memory requirement is reduced more.
I think this PR will help.

Consolidated points.
1. store size didn't get impacted
2. GC time didn't get impacted
3. Time impact is low as mentioned above
4. memory requirement reduced to higher level



Regards,
Akash R Nilugal

On Mon, Aug 27, 2018 at 11:51 AM Akash Nilugal <[hidden email]>
wrote:

> Hi all,
>
> Currently, when the fallback is initiated for a column page in case of
> local dictionary, we are keeping both encoded data
> and actual data in memory and then we form the new column page without
> dictionary encoding and then at last we free the Encoded Column Page.
> Because of this offheap memory footprint increases.
>
> We can reduce the offheap memory footprint. This can be done using decoder
> based fallback mechanism.
> This means, no need to keep the actual data along with encoded data in
> encoded column page. We can keep only encoded data and to form a new column
> page, get the dictionary data from encoded column page by uncompressing and
> using dictionary data get the actual data using local dictionary generator
> and put it in new column page created and compress it again and give to
> consumer for writing blocklet.
>
> The above process may slow down the loading, but it will reduces the
> memory footprint. So we can give a property which will decide whether to
> take current fallback procedure or decoder based fallback mechanism dring
> fallback.
> Any inputs or suggestions are welcomed.
>
>
> Regards,
> Akash
>