How to reduce driver memory usage of carbon index

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to reduce driver memory usage of carbon index

yaojinguo
Hi community ,
  I am using CarbonData1.3 + Spark2.1, I find a potential bottleneck when
using Carbondata. As
I know, CarbonData loads all of the carbonindex files and turn these files
to DataMap or SegmentIndex (for early version)which contains startkey
,endkey,min/max value of each column. If I have one table with 200 columns
which contains 1000 segments, each segment has 2000 carbondata files, assume
each column occupies just 10 bytes, then you need at least 20GB memory to
store min/max values only. Any suggestion to resolve this problem?




--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: How to reduce driver memory usage of carbon index

sraghunandan
Hi Yaojinguo,
  The issue is we currently load all the index info into driver memory
which causes a large memory footprint irrespective of query type(filter or
full scan).
 This can be avoided by loading only required segment's index information
for filter queries.
  We could achieve it by creating a datamap containing segment level
min/max information.Instead of loading all the datamaps till blocklet
level, we can load only the segment level min/max at startup and load the
next level datamaps based on the query.
This approach combined with LRU should be able to limit the memory
consumption at driver side.

The datamap containing segment level min/max needs to be implemented and is
not currently supported in carbondata.

Regards
Raghu

On Wed, Apr 11, 2018 at 1:25 PM, yaojinguo <[hidden email]> wrote:

> Hi community ,
>   I am using CarbonData1.3 + Spark2.1, I find a potential bottleneck when
> using Carbondata. As
> I know, CarbonData loads all of the carbonindex files and turn these files
> to DataMap or SegmentIndex (for early version)which contains startkey
> ,endkey,min/max value of each column. If I have one table with 200 columns
> which contains 1000 segments, each segment has 2000 carbondata files,
> assume
> each column occupies just 10 bytes, then you need at least 20GB memory to
> store min/max values only. Any suggestion to resolve this problem?
>
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>