Apache CarbonData Dev Mailing List archive

Re: GC problem and performance refine problem

Posted by Anning Luo-2 on Nov 11, 2016; 9:51am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GC-problem-and-performance-refine-problem-tp2844p2867.html

1. I have set --num-executors=100 in all test. The image write 100 nodes
means one executor for one node, and when there is 64 node, there maybe two
or three executor on one node. The total executor number is always 100.

2. I do not find the property enable.blocklet.distribution in 0.1.1. I
found it in 0.2.0. I am testing on 0.1.1, and will try it later. If you
concern locality level, it seems that the most long query is not caused by
IO problem. Statistic of the long task statistic like this:

+--------------------+----------------+--------------------+----------------+---------------+-----------+-------------------+
| task_id|load_blocks_time|load_dictionary_time|scan_blocks_time|scan_blocks_num|result_size|total_executor_time|
+--------------------+----------------+--------------------+----------------+---------------+-----------+-------------------+
|4199754456147478_134| 1 | 32 |
3306 | 3 | 42104 | 13029 |
+--------------------+----------------+--------------------+----------------+---------------+-----------+-------------------+

I have suffer from IO, but after configure speculation, most tasks
with a long IO will be resend.

3. distinct value:

g: 11

h: 3

f: 7

e: 3

d: 281

4. In the query, only e, f ,g, h, d and A are used in filter, others are
not used. So I think others used in the aggregation are no need added for
index.

5. I have write the e, f, g, h, d in most left just after "create table..."

2016-11-11 15:08 GMT+08:00 Kumar Vishal <[hidden email]>:

> Hi An Lan,
>
> Please confirm below things.
>
> 1. Is dynamic executor is enabled?? If it is enabled can u disabled and
> try(this is to check is there any impact with dynamic executor)
> for disabling dynamic executor you need set in spark-default.conf
> --num-executors=100
>
> 2. Can u please set in below property enable.blocklet.distribution=false
> and execute the query.
>
> 3. Cardinality of each column.
>
> 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??
>
> 5. Can u keep all the columns which is present in filter on left side so
> less no of blocks will identified during block pruning and it will improve
> the query performance.
>
>
> -Regards
> Kumar Vishal
>
> On Fri, Nov 11, 2016 at 11:59 AM, An Lan <[hidden email]> wrote:
>
> > Hi Kumar Vishal,
> >
> > 1. Create table ddl:
> >
> > CREATE TABLE IF NOT EXISTS Table1
> >
> > (* h Int, g Int, d String, f Int, e Int,*
> >
> > a Int, b Int, …(extra near 300 columns)
> >
> > STORED BY 'org.apache.carbondata.format'
> > TBLPROPERTIES(
> >
> > "NO_INVERTED_INDEX”=“a”,
> > "NO_INVERTED_INDEX”=“b”,
> >
> > …(extra near 300 columns)
> >
> > "DICTIONARY_INCLUDE”=“a”,
> >
> > "DICTIONARY_INCLUDE”=“b”,
> >
> > …(extra near 300 columns)
> >
> > )
> >
> > 2. 3. There more than hundreds node in the cluster, but cluster is
> > used mixed with other application. Some time when node is enough, we
> will
> > get 100 distinct node.
> >
> > 4. I give a statistic of task time during once query and mark
> > distinct nodes below:
> >
> > [image: 内嵌图片 1]
> >
> >
> >
> >
> > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <[hidden email]>:
> >
> >> Hi Anning Luo,
> >>
> >> Can u please provide below details.
> >>
> >> 1.Create table ddl.
> >> 2.Number of node in you cluster setup.
> >> 3. Number of executors per node.
> >> 4. Query statistics.
> >>
> >> Please find my comments in bold.
> >>
> >> Problem:
> >> 1. GC problem. We suffer a 20%~30% GC time for
> >> some task in first stage after a lot of parameter refinement. We now use
> >> G1
> >> GC in java8. GC time will double if use CMS. The main GC time is spent
> on
> >> young generation GC. Almost half memory of young generation will be copy
> >> to
> >> old generation. It seems lots of object has a long life than GC period
> and
> >> the space is not be reuse(as concurrent GC will release it later). When
> we
> >> use a large Eden(>=1G for example), once GC time will be seconds. If set
> >> Eden little(256M for example), once GC time will be hundreds
> milliseconds,
> >> but more frequency and total is still seconds. Is there any way to
> lessen
> >> the GC time? (We don’t consider the first query and second query in this
> >> case.)
> >>
> >> *How many node are present in your cluster setup?? If nodes are less
> >> please
> >> reduce the number of executors per node.*
> >>
> >> 2. Performance refine problem. Row number after
> >> being filtered is not uniform. Some node maybe heavy. It spend more time
> >> than other node. The time of one task is 4s ~ 16s. Is any method to
> refine
> >> it?
> >>
> >> 3. Too long time for first and second query. I
> >> know dictionary and some index need to be loaded for the first time. But
> >> after I trying use query below to preheat it, it still spend a lot of
> >> time.
> >> How could I preheat the query correctly?
> >> select Aarray, a, b, c… from Table1 where Aarray
> >> is
> >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> >>
> >> *Currently we are working on first time query improvement. For now you
> can
> >> run select count(*) or count(column), so all the blocks get loaded and
> >> then
> >> you can run the actual query.*
> >>
> >>
> >> 4. Any other suggestion to lessen the query time?
> >>
> >>
> >> Some suggestion：
> >> The log by class QueryStatisticsRecorder give me a good
> means
> >> to find the neck bottle, but not enough. There still some metric I think
> >> is
> >> very useful:
> >> 1. filter ratio. i.e.. not only result_size but also the
> >> origin
> >> size so we could know how many data is filtered.
> >> 2. IO time. The scan_blocks_time is not enough. If it is
> high,
> >> we know somethings wrong, but not know what cause that problem. The real
> >> IO
> >> time for data is not be provided. As there may be several file for one
> >> partition, know the program slow is caused by datanode or executor
> itself
> >> give us intuition to find the problem.
> >> 3. The TableBlockInfo for task. I log it by myself when
> >> debugging. It tell me how many blocklets is locality. The spark web
> >> monitor
> >> just give a locality level, but may be only one blocklet is locality.
> >>
> >>
> >> -Regards
> >> Kumar Vishal
> >>
> >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]> wrote:
> >>
> >> > Hi,
> >> >
> >> > We are using carbondata to build our table and running query in
> >> > CarbonContext. We have some performance problem during refining the
> >> system.
> >> >
> >> > *Background*：
> >> >
> >> > *cluster*： 100 executor，5 task/executor, 10G
> >> > memory/executor
> >> >
> >> > *data*: 60+GB(per one replica) as carbon
> >> data
> >> > format, 600+MB/file * 100 file, 300+columns, 300+million rows
> >> >
> >> > *sql example:*
> >> >
> >> > select A,
> >> >
> >> > sum(a),
> >> >
> >> > sum(b),
> >> >
> >> > sum(c),
> >> >
> >> > …( extra 100 aggregation like
> >> > sum(column))
> >> >
> >> > from Table1 LATERAL VIEW
> >> > explode(split(Aarray, ‘*;*’)) ATable AS A
> >> >
> >> > where A is not null and d >
> >> “ab:c-10”
> >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A
> >> >
> >> > *target query time*: <10s
> >> >
> >> > *current query time*: 15s ~ 25s
> >> >
> >> > *scene:* OLAP system. <100 queries every
> >> day.
> >> > Concurrency number is not high. Most time cpu is idle, so this service
> >> will
> >> > run with other program. The service will run for long time. We could
> not
> >> > occupy a very large memory for every executor.
> >> >
> >> > *refine*: I have build index and
> >> dictionary on
> >> > d, e, f, g, h and build dictionary on all other aggregation
> >> columns(i.e. a,
> >> > b, c, …100+ columns). And make sure there is one segment for total
> >> data. I
> >> > have open the speculation(quantile=0.5, interval=250, multiplier=1.2).
> >> >
> >> > Time is mainly spent on first stage before shuffling. As 95% data will
> >> be
> >> > filtered out, the shuffle process spend little time. In first stage,
> >> most
> >> > task complete in less than 10s. But there still be near 50 tasks
> longer
> >> > than 10s. Max task time in one query may be 12~16s.
> >> >
> >> > *Problem:*
> >> >
> >> > 1. GC problem. We suffer a 20%~30% GC time for some task in
> >> first
> >> > stage after a lot of parameter refinement. We now use G1 GC in java8.
> GC
> >> > time will double if use CMS. The main GC time is spent on young
> >> generation
> >> > GC. Almost half memory of young generation will be copy to old
> >> generation.
> >> > It seems lots of object has a long life than GC period and the space
> is
> >> not
> >> > be reuse(as concurrent GC will release it later). When we use a large
> >> > Eden(>=1G for example), once GC time will be seconds. If set Eden
> >> > little(256M for example), once GC time will be hundreds milliseconds,
> >> but
> >> > more frequency and total is still seconds. Is there any way to lessen
> >> the
> >> > GC time? (We don’t consider the first query and second query in this
> >> case.)
> >> >
> >> > 2. Performance refine problem. Row number after being
> >> filtered is
> >> > not uniform. Some node maybe heavy. It spend more time than other
> node.
> >> The
> >> > time of one task is 4s ~ 16s. Is any method to refine it?
> >> >
> >> > 3. Too long time for first and second query. I know
> dictionary
> >> > and some index need to be loaded for the first time. But after I
> trying
> >> use
> >> > query below to preheat it, it still spend a lot of time. How could I
> >> > preheat the query correctly?
> >> >
> >> > select Aarray, a, b, c… from Table1 where Aarray is not
> >> null
> >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> >> >
> >> > 4. Any other suggestion to lessen the query time?
> >> >
> >> >
> >> >
> >> > Some suggestion：
> >> >
> >> > The log by class QueryStatisticsRecorder give me a good
> >> means
> >> > to find the neck bottle, but not enough. There still some metric I
> >> think is
> >> > very useful:
> >> >
> >> > 1. filter ratio. i.e.. not only result_size but also the
> >> origin
> >> > size so we could know how many data is filtered.
> >> >
> >> > 2. IO time. The scan_blocks_time is not enough. If it is
> >> high,
> >> > we know somethings wrong, but not know what cause that problem. The
> >> real IO
> >> > time for data is not be provided. As there may be several file for one
> >> > partition, know the program slow is caused by datanode or executor
> >> itself
> >> > give us intuition to find the problem.
> >> >
> >> > 3. The TableBlockInfo for task. I log it by myself when
> >> > debugging. It tell me how many blocklets is locality. The spark web
> >> monitor
> >> > just give a locality level, but may be only one blocklet is locality.
> >> >
> >> >
> >> > ---------------------
> >> >
> >> > Anning Luo
> >> >
> >> > *HULU*
> >> >
> >> > Email: [hidden email]
> >> >
> >> > [hidden email]
> >> >
> >>
> >
> >
>