Hi Yaojinguo,
The issue is we currently load all the index info into driver memory
which causes a large memory footprint irrespective of query type(filter or
full scan).
This can be avoided by loading only required segment's index information
for filter queries.
We could achieve it by creating a datamap containing segment level
min/max information.Instead of loading all the datamaps till blocklet
level, we can load only the segment level min/max at startup and load the
next level datamaps based on the query.
This approach combined with LRU should be able to limit the memory
consumption at driver side.
The datamap containing segment level min/max needs to be implemented and is
not currently supported in carbondata.
Regards
Raghu
On Wed, Apr 11, 2018 at 1:25 PM, yaojinguo <
[hidden email]> wrote:
> Hi community ,
> I am using CarbonData1.3 + Spark2.1, I find a potential bottleneck when
> using Carbondata. As
> I know, CarbonData loads all of the carbonindex files and turn these files
> to DataMap or SegmentIndex (for early version)which contains startkey
> ,endkey,min/max value of each column. If I have one table with 200 columns
> which contains 1000 segments, each segment has 2000 carbondata files,
> assume
> each column occupies just 10 bytes, then you need at least 20GB memory to
> store min/max values only. Any suggestion to resolve this problem?
>
>
>
>
> --
> Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>