Apache CarbonData Dev Mailing List archive

GC problem and performance refine problem

Classic

List

Threaded

3 messages Options

Anning Luo

Nov 10, 2016; 7:00am

GC problem and performance refine problem

Hi,

We are using carbondata to build our table and running query in CarbonContext. We have some performance problem during refining the system.

Background：
cluster： 100 executor，5 task/executor, 10G memory/executor
data: 60+GB(per one replica) as carbon data format, 600+MB/file * 100 file, 300+columns, 300+million rows
sql example:
select A,
  sum(a),
  sum(b),
  sum(c),
  …( extra 100 aggregation like sum(column))
  from Table1 LATERAL VIEW explode(split(Aarray, ‘;’)) ATable AS A
  where A is not null and d > “ab:c-10” and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A
target query time: <10s
current query time: 15s ~ 25s
scene: OLAP system. <100 queries every day. Concurrency number is not high. Most time cpu is idle, so this service will run with other program. The service will run for long time. We could not occupy a very large memory for every executor.
refine: I have build index and dictionary on d, e, f, g, h and build dictionary on all other aggregation columns(i.e. a, b, c, …100+ columns). And make sure there is one segment for total data. I have open the speculation(quantile=0.5, interval=250, multiplier=1.2).

Time is mainly spent on first stage before shuffling. As 95% data will be filtered out, the shuffle process spend little time. In first stage, most task complete in less than 10s. But there still be near 50 tasks longer than 10s. Max task time for a query may be 12~16s.

Problem:
1. GC problem. We suffer a 20%~30% GC time for some task in first stage after a lot of parameter refinement. We now use G1 GC in java8. GC time will double if use CMS. The main GC time is spent on young generation GC. Almost half memory of young generation will be copy to old generation. It seems lots of object has a long life than GC period and the space is not be reuse(as concurrent GC will release it later). When we use a large Eden(>=1G for example), once GC time will be seconds. If set Eden little(256M for example), once GC time will be hundreds milliseconds, but more frequency and total is still seconds. Is there any way to lessen the GC time? (We don’t consider the first query and second query in this case.)
2. Performance refine problem. Row number after being filtered is not uniform. Some node maybe heavy. It spend more time than other node. The time of one task is 4s ~ 16s. Is any method to refine it?
3. Too long time for first and second query. I know dictionary and some index need to be loaded for the first time. But after I trying use query below to preheat it, it still spend a lot of time. How could I preheat the query correctly?
select Aarray, a, b, c… from Table1 where Aarray is not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
4. Any other suggestion to lessen the query time?

Some suggestion：
The log by class QueryStatisticsRecorder give me a good means to find the neck bottle, but not enough. There still some metric I think is very useful:
1. filter ratio. i.e.. not only result_size but also the origin size so we could know how many data is filtered.
2. IO time. The scan_blocks_time is not enough. If it is high, we know somethings wrong, but not know what cause that problem. The real IO time for data is not be provided. As there may be several file for one partition, know the program slow is caused by datanode or executor itself give us intuition to find the problem.
3. The TableBlockInfo for task. I log it by myself when debugging. It tell me how many blocklets is locality. The spark web monitor just give a locality level, but may be only one blocklet is locality.

kumarvishal09

Nov 10, 2016; 10:55am

Re: GC problem and performance refine problem

Hi Anning Luo,

Can u please provide below details.

1.Create table ddl.
2.Number of node in you cluster setup.
3. Number of executors per node.
4. Query statistics.

Please find my comments in bold.

Problem:
1. GC problem. We suffer a 20%~30% GC time for
some task in first stage after a lot of parameter refinement. We now use G1
GC in java8. GC time will double if use CMS. The main GC time is spent on
young generation GC. Almost half memory of young generation will be copy to
old generation. It seems lots of object has a long life than GC period and
the space is not be reuse(as concurrent GC will release it later). When we
use a large Eden(>=1G for example), once GC time will be seconds. If set
Eden little(256M for example), once GC time will be hundreds milliseconds,
but more frequency and total is still seconds. Is there any way to lessen
the GC time? (We don’t consider the first query and second query in this
case.)

*How many node are present in your cluster setup?? If nodes are less please
reduce the number of executors per node.*

2. Performance refine problem. Row number after
being filtered is not uniform. Some node maybe heavy. It spend more time
than other node. The time of one task is 4s ~ 16s. Is any method to refine
it?

3. Too long time for first and second query. I
know dictionary and some index need to be loaded for the first time. But
after I trying use query below to preheat it, it still spend a lot of time.
How could I preheat the query correctly?
select Aarray, a, b, c… from Table1 where Aarray is
not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55

*Currently we are working on first time query improvement. For now you can
run select count(*) or count(column), so all the blocks get loaded and then
you can run the actual query.*

4. Any other suggestion to lessen the query time?

Some suggestion：
The log by class QueryStatisticsRecorder give me a good means
to find the neck bottle, but not enough. There still some metric I think is
very useful:
1. filter ratio. i.e.. not only result_size but also the origin
size so we could know how many data is filtered.
2. IO time. The scan_blocks_time is not enough. If it is high,
we know somethings wrong, but not know what cause that problem. The real IO
time for data is not be provided. As there may be several file for one
partition, know the program slow is caused by datanode or executor itself
give us intuition to find the problem.
3. The TableBlockInfo for task. I log it by myself when
debugging. It tell me how many blocklets is locality. The spark web monitor
just give a locality level, but may be only one blocklet is locality.

-Regards
Kumar Vishal

On Thu, Nov 10, 2016 at 12:30 PM, Anning Luo <[hidden email]> wrote:

> Hi,
>
> We are using carbondata to build our table and running query in
> CarbonContext. We have some performance problem during refining the system.
>
> Background：
> cluster： 100 executor，5 task/executor, 10G
> memory/executor
> data: 60+GB(per one replica) as carbon data
> format, 600+MB/file * 100 file, 300+columns, 300+million rows
> sql example:
> select A,
> sum(a),
> sum(b),
> sum(c),
> …( extra 100 aggregation like
> sum(column))
> from Table1 LATERAL VIEW explode(split(Aarray, ‘;’)) ATable AS A
> where A is not null and d > “ab:c-10” and d < “h:0f3s” and e!=10
> and f=22 and g=33 and h=44 GROUP BY A
> target query time: <10s
> current query time: 15s ~ 25s
> scene: OLAP system. <100 queries every day.
> Concurrency number is not high. Most time cpu is idle, so this service will
> run with other program. The service will run for long time. We could not
> occupy a very large memory for every executor.
> refine: I have build index and dictionary on
> d, e, f, g, h and build dictionary on all other aggregation columns(i.e. a,
> b, c, …100+ columns). And make sure there is one segment for total data. I
> have open the speculation(quantile=0.5, interval=250, multiplier=1.2).
>
> Time is mainly spent on first stage before shuffling. As 95% data will be
> filtered out, the shuffle process spend little time. In first stage, most
> task complete in less than 10s. But there still be near 50 tasks longer
> than 10s. Max task time for a query may be 12~16s.
>
> Problem:
> 1. GC problem. We suffer a 20%~30% GC time for
> some task in first stage after a lot of parameter refinement. We now use G1
> GC in java8. GC time will double if use CMS. The main GC time is spent on
> young generation GC. Almost half memory of young generation will be copy to
> old generation. It seems lots of object has a long life than GC period and
> the space is not be reuse(as concurrent GC will release it later). When we
> use a large Eden(>=1G for example), once GC time will be seconds. If set
> Eden little(256M for example), once GC time will be hundreds milliseconds,
> but more frequency and total is still seconds. Is there any way to lessen
> the GC time? (We don’t consider the first query and second query in this
> case.)
> 2. Performance refine problem. Row number after
> being filtered is not uniform. Some node maybe heavy. It spend more time
> than other node. The time of one task is 4s ~ 16s. Is any method to refine
> it?
> 3. Too long time for first and second query. I
> know dictionary and some index need to be loaded for the first time. But
> after I trying use query below to preheat it, it still spend a lot of time.
> How could I preheat the query correctly?
> select Aarray, a, b, c… from Table1 where Aarray
> is not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> 4. Any other suggestion to lessen the query time?
>
> Some suggestion：
> The log by class QueryStatisticsRecorder give me a good means
> to find the neck bottle, but not enough. There still some metric I think is
> very useful:
> 1. filter ratio. i.e.. not only result_size but also the
> origin size so we could know how many data is filtered.
> 2. IO time. The scan_blocks_time is not enough. If it is high,
> we know somethings wrong, but not know what cause that problem. The real IO
> time for data is not be provided. As there may be several file for one
> partition, know the program slow is caused by datanode or executor itself
> give us intuition to find the problem.
> 3. The TableBlockInfo for task. I log it by myself when
> debugging. It tell me how many blocklets is locality. The spark web monitor
> just give a locality level, but may be only one blocklet is locality.
>

... [show rest of quote]

kumar vishal

Anning Luo-2

Nov 10, 2016; 3:12pm

GC problem and performance refine problem

In reply to this post by Anning Luo

Hi,

We are using carbondata to build our table and running query in CarbonContext. We have some performance problem during refining the system.

Background：

cluster：    100 executor，5 task/executor, 10G memory/executor

data: 60+GB(per one replica) as carbon data format, 600+MB/file * 100 file, 300+columns, 300+million rows

sql example:

select A,

  sum(a),

  sum(b),

  sum(c),

  …( extra 100 aggregation like sum(column))

  from Table1 LATERAL VIEW explode(split(Aarray, ‘;’)) ATable AS A

  where A is not null and d > “ab:c-10” and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A

target query time: <10s

current query time: 15s ~ 25s

scene: OLAP system. <100 queries every day. Concurrency number is not high. Most time cpu is idle, so this service will run with other program. The service will run for long time. We could not occupy a very large memory for every executor.

refine: I have build index and dictionary on d, e, f, g, h and build dictionary on all other aggregation columns(i.e. a, b, c, …100+ columns). And make sure there is one segment for total data. I have open the speculation(quantile=0.5, interval=250, multiplier=1.2).

Time is mainly spent on first stage before shuffling. As 95% data will be filtered out, the shuffle process spend little time. In first stage, most task complete in less than 10s. But there still be near 50 tasks longer than 10s. Max task time for a query may be 12~16s.

Problem:

1.         GC problem. We suffer a 20%~30% GC time for some task in first stage after a lot of parameter refinement. We now use G1 GC in java8. GC time will double if use CMS. The main GC time is spent on young generation GC. Almost half memory of young generation will be copy to old generation. It seems lots of object has a long life than GC period and the space is not be reuse(as concurrent GC will release it later). When we use a large Eden(>=1G for example), once GC time will be seconds. If set Eden little(256M for example), once GC time will be hundreds milliseconds, but more frequency and total is still seconds. Is there any way to lessen the GC time? (We don’t consider the first query and second query in this case.)

2.         Performance refine problem. Row number after being filtered is not uniform. Some node maybe heavy. It spend more time than other node. The time of one task is 4s ~ 16s. Is any method to refine it?

3.         Too long time for first and second query. I know dictionary and some index need to be loaded for the first time. But after I trying use query below to preheat it, it still spend a lot of time. How could I preheat the query correctly?

                                    select Aarray, a, b, c… from Table1 where Aarray is not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55

4.         Any other suggestion to lessen the query time?

Some suggestion：

The log by class QueryStatisticsRecorder give me a good means to find the neck bottle, but not enough. There still some metric I think is very useful:

1. filter ratio. i.e.. not only result_size but also the origin size so we could know how many data is filtered.

2. IO time. The scan_blocks_time is not enough. If it is high, we know somethings wrong, but not know what cause that problem. The real IO time for data is not be provided. As there may be several file for one partition, know the program slow is caused by datanode or executor itself give us intuition to find the problem.

3. The TableBlockInfo for task. I log it by myself when debugging. It tell me how many blocklets is locality. The spark web monitor just give a locality level, but may be only one blocklet is locality.

-----------------------------------------

Anning Luo

HULU

Email: [hidden email]

[hidden email]

Phone: (+86)15011500384