[Improvement] Carbon query gc problem

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Improvement] Carbon query gc problem

kumarvishal09
There are lots of gc when carbon is processing more number of records
during query, which is impacting carbon query performance.To solve this gc
problem happening when query output is too huge or when more number of
records are processed, I would like to propose below solution.

Currently we are storing all the data which is read during query from
carbon data file in heap, when number of query output is huge it is causing
more gc. Instead of storing in heap we can store this data in offheap and
will clear when scanning is finished for that query.

Please vote and comment for above proposal.

-Regards
KUmar Vishal
kumar vishal
Reply | Threaded
Open this post in threaded view
|

Re: [Improvement] Carbon query gc problem

sraghunandan
+1
Good idea to avoid gc overhead.we need to be careful in clearing memory
after use
On Tue, 13 Dec 2016 at 2:17 PM, Kumar Vishal <[hidden email]>
wrote:

> There are lots of gc when carbon is processing more number of records
> during query, which is impacting carbon query performance.To solve this gc
> problem happening when query output is too huge or when more number of
> records are processed, I would like to propose below solution.
>
> Currently we are storing all the data which is read during query from
> carbon data file in heap, when number of query output is huge it is causing
> more gc. Instead of storing in heap we can store this data in offheap and
> will clear when scanning is finished for that query.
>
> Please vote and comment for above proposal.
>
> -Regards
> KUmar Vishal
>
Reply | Threaded
Open this post in threaded view
|

Re: [Improvement] Carbon query gc problem

Liang Chen
Administrator
In reply to this post by kumarvishal09
Hi +1,Store data in offheap to avoid gc problem , the solution will help performance more.
Kumar Vishal wrote
There are lots of gc when carbon is processing more number of records during query, which is impacting carbon query performance.To solve this gc problem happening when query output is too huge or when more number of records are processed, I would like to propose below solution. Currently we are storing all the data which is read during query from carbon data file in heap, when number of query output is huge it is causing more gc. Instead of storing in heap we can store this data in offheap and will clear when scanning is finished for that query. Please vote and comment for above proposal. -Regards KUmar Vishal
Reply | Threaded
Open this post in threaded view
|

Re: [Improvement] Carbon query gc problem

ZhuWilliam
In reply to this post by kumarvishal09
+1   Heap should not store data ,it should be used to store runtime temp data.
Reply | Threaded
Open this post in threaded view
|

Re: [Improvement] Carbon query gc problem

Anning Luo-2
+1, I have suffered from gc problem. As I understand, the BatchResult will
be cached and continue to be kept in memory for a little long term, which
cause a lot of data be moved from Young to Old. It is better to move it to
off-heap.

2016-12-20 11:57 GMT+08:00 ZhuWilliam <[hidden email]>:

> +1   Heap should not store data ,it should be used to store runtime temp
> data.
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Improvement-
> Carbon-query-gc-problem-tp4322p4718.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: [Improvement] Carbon query gc problem

kumarvishal09
Hi All,

Please review Pr#450
https://github.com/apache/incubator-carbondata/pull/450/

-Regards
Kumar Vishal

On Tue, Dec 20, 2016 at 1:13 PM, An Lan <[hidden email]> wrote:

> +1, I have suffered from gc problem. As I understand, the BatchResult will
> be cached and continue to be kept in memory for a little long term, which
> cause a lot of data be moved from Young to Old. It is better to move it to
> off-heap.
>
> 2016-12-20 11:57 GMT+08:00 ZhuWilliam <[hidden email]>:
>
> > +1   Heap should not store data ,it should be used to store runtime temp
> > data.
> >
> >
> >
> > --
> > View this message in context: http://apache-carbondata-
> > mailing-list-archive.1130556.n5.nabble.com/Improvement-
> > Carbon-query-gc-problem-tp4322p4718.html
> > Sent from the Apache CarbonData Mailing List archive mailing list archive
> > at Nabble.com.
> >
>
kumar vishal