Apache CarbonData Dev Mailing List archive

Re: GC problem and performance refine problem

Posted by Anning Luo-2 on Nov 16, 2016; 2:42am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GC-problem-and-performance-refine-problem-tp2844p2923.html

Hi Kumar Vishal,

1. I found the quantity of rows filtered out by invert index is not uniform between different tasks and the difference is large. Some task may be 3~4k row after filtered, but the longer tasks may be 3~4w. When most longer task on same node, time cost will be more longer than others. So, is there any way to balance the data and make rows distribute uniform between task?

Now, the blocklet size is still 120k rows. 3k+ blocket in total. Every task has 6 blocket.

2. I upload some logs last time. Is there some access problem about it? Is there any suggest or question about it?

2016-11-14 11:29 GMT+08:00 An Lan <[hidden email]>:

Hi Kumar Vishal,

Driver and some executor logs are in the accessory. The same query run for five times.

Time consume for every query:

67068ms, 45758ms, 26497ms, 22619ms, 21504ms

The first stage for every query:

(first query -> start from 0 stage

second query -> 4

third query -> 8

fourth query -> 12

five query -> 16)

see "first stage of queries.png" in accessory

Longest task for every query:

(Task and log relationship:

stage 0, task 249 -> 1.log

stage 4, task 420 -> 2.log

stage 8, task 195 -> 3.log

stage 12, task 350 -> 4.log

stage 16, task 321 -> 5.log)
see "longest task.png" in accessory

longest tasks.png

first stage of queries.png

dirverlog

1.log

2.log

3.log

4.log

5.log

2016-11-14 11:22 GMT+08:00 An Lan <[hidden email]>:

Hi,

Driver and some executor logs are in the accessory. The same query run for five times.

Time consume for every query:

67068ms, 45758ms, 26497ms, 22619ms, 21504ms

The first stage for every query:

(first query -> start from 0 stage

second query -> 4

third query -> 8

fourth query -> 12

five query -> 16)

Longest task for every query:

(Task and log relationship:

stage 0, task 249 -> 1.log

stage 4, task 420 -> 2.log

stage 8, task 195 -> 3.log

stage 12, task 350 -> 4.log

stage 16, task 321 -> 5.log)

2016-11-11 20:25 GMT+08:00 Kumar Vishal <[hidden email]>:
What is the time difference between first time and second time query as
second time it will read from os cache,so i think there wont be any IO
bottleneck.

Can u provide driver log and executor log for task which are taking more
time.

-Regards
Kumar Vishal

On Fri, Nov 11, 2016 at 3:21 PM, An Lan <[hidden email]> wrote:

> 1. I have set --num-executors=100 in all test. The image write 100 nodes
> means one executor for one node, and when there is 64 node, there maybe two
> or three executor on one node. The total executor number is always 100.
>
> 2. I do not find the property enable.blocklet.distribution in 0.1.1. I
> found it in 0.2.0. I am testing on 0.1.1, and will try it later. If you
> concern locality level, it seems that the most long query is not caused by
> IO problem. Statistic of the long task statistic like this:
>
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+-------------------+
> | task_id|load_blocks_time|load_dictionary_time|scan_blocks_
> time|scan_blocks_num|result_size|total_executor_time|
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+-------------------+
> |4199754456147478_134| 1 | 32 |
> 3306 | 3 | 42104 | 13029 |
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+-------------------+
>
> I have suffer from IO, but after configure speculation, most tasks
> with a long IO will be resend.
>
>
> 3. distinct value:
>
> g: 11
>
> h: 3
>
> f: 7
>
> e: 3
>
> d: 281
>
> 4. In the query, only e, f ,g, h, d and A are used in filter, others are
> not used. So I think others used in the aggregation are no need added for
> index.
>
> 5. I have write the e, f, g, h, d in most left just after "create table..."
>
> 2016-11-11 15:08 GMT+08:00 Kumar Vishal <[hidden email]>:
>
> > Hi An Lan,
> >
> > Please confirm below things.
> >
> > 1. Is dynamic executor is enabled?? If it is enabled can u disabled and
> > try(this is to check is there any impact with dynamic executor)
> > for disabling dynamic executor you need set in spark-default.conf
> > --num-executors=100
> >
> > 2. Can u please set in below property enable.blocklet.distribution=false
> > and execute the query.
> >
> > 3. Cardinality of each column.
> >
> > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??
> >
> > 5. Can u keep all the columns which is present in filter on left side so
> > less no of blocks will identified during block pruning and it will
> improve
> > the query performance.
> >
> >
> > -Regards
> > Kumar Vishal
> >
> > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <[hidden email]> wrote:
> >
> > > Hi Kumar Vishal,
> > >
> > > 1. Create table ddl:
> > >
> > > CREATE TABLE IF NOT EXISTS Table1
> > >
> > > (* h Int, g Int, d String, f Int, e Int,*
> > >
> > > a Int, b Int, …(extra near 300 columns)
> > >
> > > STORED BY 'org.apache.carbondata.format'
> > > TBLPROPERTIES(
> > >
> > > "NO_INVERTED_INDEX”=“a”,
> > > "NO_INVERTED_INDEX”=“b”,
> > >
> > > …(extra near 300 columns)
> > >
> > > "DICTIONARY_INCLUDE”=“a”,
> > >
> > > "DICTIONARY_INCLUDE”=“b”,
> > >
> > > …(extra near 300 columns)
> > >
> > > )
> > >
> > > 2. 3. There more than hundreds node in the cluster, but cluster
> is
> > > used mixed with other application. Some time when node is enough, we
> > will
> > > get 100 distinct node.
> > >
> > > 4. I give a statistic of task time during once query and mark
> > > distinct nodes below:
> > >
> > > [image: 内嵌图片 1]
> > >
> > >
> > >
> > >
> > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <[hidden email]>:
> > >
> > >> Hi Anning Luo,
> > >>
> > >> Can u please provide below details.
> > >>
> > >> 1.Create table ddl.
> > >> 2.Number of node in you cluster setup.
> > >> 3. Number of executors per node.
> > >> 4. Query statistics.
> > >>
> > >> Please find my comments in bold.
> > >>
> > >> Problem:
> > >> 1. GC problem. We suffer a 20%~30% GC time
> for
> > >> some task in first stage after a lot of parameter refinement. We now
> use
> > >> G1
> > >> GC in java8. GC time will double if use CMS. The main GC time is spent
> > on
> > >> young generation GC. Almost half memory of young generation will be
> copy
> > >> to
> > >> old generation. It seems lots of object has a long life than GC period
> > and
> > >> the space is not be reuse(as concurrent GC will release it later).
> When
> > we
> > >> use a large Eden(>=1G for example), once GC time will be seconds. If
> set
> > >> Eden little(256M for example), once GC time will be hundreds
> > milliseconds,
> > >> but more frequency and total is still seconds. Is there any way to
> > lessen
> > >> the GC time? (We don’t consider the first query and second query in
> this
> > >> case.)
> > >>
> > >> *How many node are present in your cluster setup?? If nodes are less
> > >> please
> > >> reduce the number of executors per node.*
> > >>
> > >> 2. Performance refine problem. Row number
> after
> > >> being filtered is not uniform. Some node maybe heavy. It spend more
> time
> > >> than other node. The time of one task is 4s ~ 16s. Is any method to
> > refine
> > >> it?
> > >>
> > >> 3. Too long time for first and second query.
> I
> > >> know dictionary and some index need to be loaded for the first time.
> But
> > >> after I trying use query below to preheat it, it still spend a lot of
> > >> time.
> > >> How could I preheat the query correctly?
> > >> select Aarray, a, b, c… from Table1 where
> Aarray
> > >> is
> > >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> > >>
> > >> *Currently we are working on first time query improvement. For now you
> > can
> > >> run select count(*) or count(column), so all the blocks get loaded and
> > >> then
> > >> you can run the actual query.*
> > >>
> > >>
> > >> 4. Any other suggestion to lessen the query time?
> > >>
> > >>
> > >> Some suggestion：
> > >> The log by class QueryStatisticsRecorder give me a good
> > means
> > >> to find the neck bottle, but not enough. There still some metric I
> think
> > >> is
> > >> very useful:
> > >> 1. filter ratio. i.e.. not only result_size but also the
> > >> origin
> > >> size so we could know how many data is filtered.
> > >> 2. IO time. The scan_blocks_time is not enough. If it is
> > high,
> > >> we know somethings wrong, but not know what cause that problem. The
> real
> > >> IO
> > >> time for data is not be provided. As there may be several file for one
> > >> partition, know the program slow is caused by datanode or executor
> > itself
> > >> give us intuition to find the problem.
> > >> 3. The TableBlockInfo for task. I log it by myself when
> > >> debugging. It tell me how many blocklets is locality. The spark web
> > >> monitor
> > >> just give a locality level, but may be only one blocklet is locality.
> > >>
> > >>
> > >> -Regards
> > >> Kumar Vishal
> > >>
> > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > We are using carbondata to build our table and running query in
> > >> > CarbonContext. We have some performance problem during refining the
> > >> system.
> > >> >
> > >> > *Background*：
> > >> >
> > >> > *cluster*： 100 executor，5 task/executor, 10G
> > >> > memory/executor
> > >> >
> > >> > *data*: 60+GB(per one replica) as
> carbon
> > >> data
> > >> > format, 600+MB/file * 100 file, 300+columns, 300+million rows
> > >> >
> > >> > *sql example:*
> > >> >
> > >> > select A,
> > >> >
> > >> > sum(a),
> > >> >
> > >> > sum(b),
> > >> >
> > >> > sum(c),
> > >> >
> > >> > …( extra 100 aggregation like
> > >> > sum(column))
> > >> >
> > >> > from Table1 LATERAL VIEW
> > >> > explode(split(Aarray, ‘*;*’)) ATable AS A
> > >> >
> > >> > where A is not null and d >
> > >> “ab:c-10”
> > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A
> > >> >
> > >> > *target query time*: <10s
> > >> >
> > >> > *current query time*: 15s ~ 25s
> > >> >
> > >> > *scene:* OLAP system. <100 queries every
> > >> day.
> > >> > Concurrency number is not high. Most time cpu is idle, so this
> service
> > >> will
> > >> > run with other program. The service will run for long time. We could
> > not
> > >> > occupy a very large memory for every executor.
> > >> >
> > >> > *refine*: I have build index and
> > >> dictionary on
> > >> > d, e, f, g, h and build dictionary on all other aggregation
> > >> columns(i.e. a,
> > >> > b, c, …100+ columns). And make sure there is one segment for total
> > >> data. I
> > >> > have open the speculation(quantile=0.5, interval=250,
> multiplier=1.2).
> > >> >
> > >> > Time is mainly spent on first stage before shuffling. As 95% data
> will
> > >> be
> > >> > filtered out, the shuffle process spend little time. In first stage,
> > >> most
> > >> > task complete in less than 10s. But there still be near 50 tasks
> > longer
> > >> > than 10s. Max task time in one query may be 12~16s.
> > >> >
> > >> > *Problem:*
> > >> >
> > >> > 1. GC problem. We suffer a 20%~30% GC time for some task in
> > >> first
> > >> > stage after a lot of parameter refinement. We now use G1 GC in
> java8.
> > GC
> > >> > time will double if use CMS. The main GC time is spent on young
> > >> generation
> > >> > GC. Almost half memory of young generation will be copy to old
> > >> generation.
> > >> > It seems lots of object has a long life than GC period and the space
> > is
> > >> not
> > >> > be reuse(as concurrent GC will release it later). When we use a
> large
> > >> > Eden(>=1G for example), once GC time will be seconds. If set Eden
> > >> > little(256M for example), once GC time will be hundreds
> milliseconds,
> > >> but
> > >> > more frequency and total is still seconds. Is there any way to
> lessen
> > >> the
> > >> > GC time? (We don’t consider the first query and second query in this
> > >> case.)
> > >> >
> > >> > 2. Performance refine problem. Row number after being
> > >> filtered is
> > >> > not uniform. Some node maybe heavy. It spend more time than other
> > node.
> > >> The
> > >> > time of one task is 4s ~ 16s. Is any method to refine it?
> > >> >
> > >> > 3. Too long time for first and second query. I know
> > dictionary
> > >> > and some index need to be loaded for the first time. But after I
> > trying
> > >> use
> > >> > query below to preheat it, it still spend a lot of time. How could I
> > >> > preheat the query correctly?
> > >> >
> > >> > select Aarray, a, b, c… from Table1 where Aarray is
> not
> > >> null
> > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> > >> >
> > >> > 4. Any other suggestion to lessen the query time?
> > >> >
> > >> >
> > >> >
> > >> > Some suggestion：
> > >> >
> > >> > The log by class QueryStatisticsRecorder give me a good
> > >> means
> > >> > to find the neck bottle, but not enough. There still some metric I
> > >> think is
> > >> > very useful:
> > >> >
> > >> > 1. filter ratio. i.e.. not only result_size but also the
> > >> origin
> > >> > size so we could know how many data is filtered.
> > >> >
> > >> > 2. IO time. The scan_blocks_time is not enough. If it is
> > >> high,
> > >> > we know somethings wrong, but not know what cause that problem. The
> > >> real IO
> > >> > time for data is not be provided. As there may be several file for
> one
> > >> > partition, know the program slow is caused by datanode or executor
> > >> itself
> > >> > give us intuition to find the problem.
> > >> >
> > >> > 3. The TableBlockInfo for task. I log it by myself when
> > >> > debugging. It tell me how many blocklets is locality. The spark web
> > >> monitor
> > >> > just give a locality level, but may be only one blocklet is
> locality.
> > >> >
> > >> >
> > >> > ---------------------
> > >> >
> > >> > Anning Luo
> > >> >
> > >> > *HULU*
> > >> >
> > >> > Email: [hidden email]
> > >> >
> > >> > [hidden email]
> > >> >
> > >>
> > >
> > >
> >
>