Login  Register

Re: GC problem and performance refine problem

Posted by kumarvishal09 on Nov 16, 2016; 7:14am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GC-problem-and-performance-refine-problem-tp2844p2927.html

Hi An Lan,
                Data is already distributed, in this case may be one
blocklet is returning more number of rows and other returning less because
of this some task will take more time.

In driver log block distribution log is not present, so it is not clear
whether it is going for block distribution or blocklet distribution.
distribution, can you please add the detail driver log.

*Some suggestion:*

1. If column is String and you are not applying filter then in create
statement add it in dictionary exclude this will avoid lookup in
dictionary. If number of String column are less then better add it in
dictionary exclude.

2. If column is a numeric column and you are not applying filter then no
need to add in dictionary include or exclude, so it will be a measure
column.

3. Currently in carbon for measure column if you will create as a double
data type it will give more compression as currently value compression mode
is support for double data type.

-Regards
Kumar Vishal


On Wed, Nov 16, 2016 at 8:12 AM, An Lan <[hidden email]> wrote:

> Hi Kumar Vishal,
>
> 1. I found the quantity of rows filtered out by invert index is not
> uniform between different tasks and the difference is large. Some task may
> be 3~4k row after filtered, but the longer tasks may be 3~4w. When most
> longer task on same node, time cost will be more longer than others. So, is
> there any way to balance the data and make rows distribute uniform between
> task?
> Now, the blocklet size is still 120k rows. 3k+ blocket in total. Every
> task has 6 blocket.
> 2. I upload some logs last time. Is there some access problem about it? Is
> there any suggest or question about it?
>
> 2016-11-14 11:29 GMT+08:00 An Lan <[hidden email]>:
>
>> Hi Kumar Vishal,
>>
>>
>> Driver and some executor logs are in the accessory. The same query run
>> for five times.
>>
>>
>>
>> Time consume for every query:
>>
>> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
>>
>>
>> The first stage for every query:
>>
>> (first query -> start from 0 stage
>>
>> second query -> 4
>>
>> third query -> 8
>>
>> fourth query -> 12
>>
>> five query -> 16)
>>
>> see "first stage of queries.png" in accessory
>>
>>
>>
>> Longest task for every query:
>>
>> (Task and log relationship:
>>
>> stage 0, task 249 -> 1.log
>>
>> stage 4, task 420 -> 2.log
>>
>> stage 8, task 195 -> 3.log
>>
>> stage 12, task 350 -> 4.log
>>
>> stage 16, task 321 -> 5.log)
>> see "longest task.png" in accessory
>>
>> ​
>>  longest tasks.png
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7dGY4cF9jRDk1QTg/view?usp=drive_web>
>> ​​
>>  first stage of queries.png
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7SVZsNnR2VEw2X0k/view?usp=drive_web>
>> ​​
>>  dirverlog
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7VllnVkNnenhyTTA/view?usp=drive_web>
>> ​​
>>  1.log
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7eUdUWmlia1J4bk0/view?usp=drive_web>
>> ​​
>>  2.log
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7ZUlGZHZVQ3phdmM/view?usp=drive_web>
>> ​​
>>  3.log
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7UHpBQzREX3N5aEk/view?usp=drive_web>
>> ​​
>>  4.log
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7NVctMmYwNldCVEk/view?usp=drive_web>
>> ​​
>>  5.log
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7ODhHNE5sSGNfOVE/view?usp=drive_web>
>> ​
>>
>> 2016-11-14 11:22 GMT+08:00 An Lan <[hidden email]>:
>>
>>> Hi,
>>>
>>>
>>>
>>> Driver and some executor logs are in the accessory. The same query run
>>> for five times.
>>>
>>>
>>>
>>> Time consume for every query:
>>>
>>> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
>>>
>>>
>>>
>>> The first stage for every query:
>>>
>>> (first query -> start from 0 stage
>>>
>>> ​second query -> 4
>>>
>>> third query -> 8
>>>
>>> fourth query -> 12
>>>
>>> five query -> 16)
>>>
>>>
>>> Longest task for every query:
>>>
>>> (Task and log relationship:
>>>
>>> stage 0, task 249 -> 1.log
>>>
>>> stage 4, task 420 -> 2.log
>>>
>>> stage 8, task 195 -> 3.log
>>>
>>> stage 12, task 350 -> 4.log
>>>
>>> stage 16, task 321 -> 5.log)
>>>
>>>
>>>
>>> 2016-11-11 20:25 GMT+08:00 Kumar Vishal <[hidden email]>:
>>>
>>>> What is the time difference between first time and second time query as
>>>> second time it will read from os cache,so i think there wont be any IO
>>>> bottleneck.
>>>>
>>>> Can u provide driver log and executor log for task which are taking more
>>>> time.
>>>>
>>>>
>>>> -Regards
>>>> Kumar Vishal
>>>>
>>>> On Fri, Nov 11, 2016 at 3:21 PM, An Lan <[hidden email]> wrote:
>>>>
>>>> > 1. I have set --num-executors=100 in all test. The image write 100
>>>> nodes
>>>> > means one executor for one node, and when there is 64 node, there
>>>> maybe two
>>>> > or three executor on one node. The total executor number is always
>>>> 100.
>>>> >
>>>> > 2. I do not find the property enable.blocklet.distribution in 0.1.1. I
>>>> > found it in 0.2.0. I am testing on 0.1.1, and will try it later. If
>>>> you
>>>> > concern locality level, it seems that the most long query is not
>>>> caused by
>>>> > IO problem. Statistic of the long task statistic like this:
>>>> >
>>>> > +--------------------+----------------+--------------------+
>>>> > ----------------+---------------+-----------+-------------------+
>>>> > |             task_id|load_blocks_time|load
>>>> _dictionary_time|scan_blocks_
>>>> > time|scan_blocks_num|result_size|total_executor_time|
>>>> > +--------------------+----------------+--------------------+
>>>> > ----------------+---------------+-----------+-------------------+
>>>> > |4199754456147478_134|              1 |                 32 |
>>>> > 3306 |             3 |     42104 |             13029 |
>>>> > +--------------------+----------------+--------------------+
>>>> > ----------------+---------------+-----------+-------------------+
>>>> >
>>>> > I have suffer from IO, but after configure speculation, most tasks
>>>> > with a long IO will be resend.
>>>> >
>>>> >
>>>> > 3. distinct value:
>>>> >
>>>> > g: 11
>>>> >
>>>> > h: 3
>>>> >
>>>> > f: 7
>>>> >
>>>> > e: 3
>>>> >
>>>> > d: 281
>>>> >
>>>> > 4. In the query, only e, f ,g, h, d and A are used in filter, others
>>>> are
>>>> > not used. So I think others used in the aggregation are no need added
>>>> for
>>>> > index.
>>>> >
>>>> > 5. I have write the e, f, g, h, d in most left just after "create
>>>> table..."
>>>> >
>>>> > 2016-11-11 15:08 GMT+08:00 Kumar Vishal <[hidden email]>:
>>>> >
>>>> > > Hi An Lan,
>>>> > >
>>>> > > Please confirm below things.
>>>> > >
>>>> > > 1. Is dynamic executor is enabled?? If it is enabled can u disabled
>>>> and
>>>> > > try(this is to check is there any impact with dynamic executor)
>>>> > >     for disabling dynamic executor you need set in
>>>> spark-default.conf
>>>> > > --num-executors=100
>>>> > >
>>>> > > 2. Can u please set in below property enable.blocklet.distribution=f
>>>> alse
>>>> > > and execute the query.
>>>> > >
>>>> > > 3. Cardinality of each column.
>>>> > >
>>>> > > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??
>>>> > >
>>>> > > 5. Can u keep all the columns which is present in filter on left
>>>> side so
>>>> > > less no of blocks will identified during block pruning and it will
>>>> > improve
>>>> > > the query performance.
>>>> > >
>>>> > >
>>>> > > -Regards
>>>> > > Kumar Vishal
>>>> > >
>>>> > > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <[hidden email]>
>>>> wrote:
>>>> > >
>>>> > > > Hi Kumar Vishal,
>>>> > > >
>>>> > > > 1.       Create table ddl:
>>>> > > >
>>>> > > > CREATE TABLE IF NOT EXISTS Table1
>>>> > > >
>>>> > > > (* h Int, g Int, d String, f Int, e Int,*
>>>> > > >
>>>> > > > a Int, b Int, …(extra near 300 columns)
>>>> > > >
>>>> > > > STORED BY 'org.apache.carbondata.format'
>>>> > > > TBLPROPERTIES(
>>>> > > >
>>>> > > > "NO_INVERTED_INDEX”=“a”,
>>>> > > > "NO_INVERTED_INDEX”=“b”,
>>>> > > >
>>>> > > > …(extra near 300 columns)
>>>> > > >
>>>> > > > "DICTIONARY_INCLUDE”=“a”,
>>>> > > >
>>>> > > > "DICTIONARY_INCLUDE”=“b”,
>>>> > > >
>>>> > > > …(extra near 300 columns)
>>>> > > >
>>>> > > > )
>>>> > > >
>>>> > > > 2.       3. There more than hundreds node in the cluster, but
>>>> cluster
>>>> > is
>>>> > > > used mixed with other application.  Some time when node is
>>>> enough, we
>>>> > > will
>>>> > > > get 100 distinct node.
>>>> > > >
>>>> > > > 4.      I give a statistic of task time during once query and mark
>>>> > > > distinct nodes below:
>>>> > > >
>>>> > > > [image: 内嵌图片 1]
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <
>>>> [hidden email]>:
>>>> > > >
>>>> > > >> Hi Anning Luo,
>>>> > > >>
>>>> > > >> Can u please provide below details.
>>>> > > >>
>>>> > > >> 1.Create table ddl.
>>>> > > >> 2.Number of node in you cluster setup.
>>>> > > >> 3. Number of executors per node.
>>>> > > >> 4. Query statistics.
>>>> > > >>
>>>> > > >> Please find my comments in bold.
>>>> > > >>
>>>> > > >> Problem:
>>>> > > >> 1.                          GC problem. We suffer a 20%~30% GC
>>>> time
>>>> > for
>>>> > > >> some task in first stage after a lot of parameter refinement. We
>>>> now
>>>> > use
>>>> > > >> G1
>>>> > > >> GC in java8. GC time will double if use CMS. The main GC time is
>>>> spent
>>>> > > on
>>>> > > >> young generation GC. Almost half memory of young generation will
>>>> be
>>>> > copy
>>>> > > >> to
>>>> > > >> old generation. It seems lots of object has a long life than GC
>>>> period
>>>> > > and
>>>> > > >> the space is not be reuse(as concurrent GC will release it
>>>> later).
>>>> > When
>>>> > > we
>>>> > > >> use a large Eden(>=1G for example), once GC time will be
>>>> seconds. If
>>>> > set
>>>> > > >> Eden little(256M for example), once GC time will be hundreds
>>>> > > milliseconds,
>>>> > > >> but more frequency and total is still seconds. Is there any way
>>>> to
>>>> > > lessen
>>>> > > >> the GC time? (We don’t consider the first query and second query
>>>> in
>>>> > this
>>>> > > >> case.)
>>>> > > >>
>>>> > > >> *How many node are present in your cluster setup?? If nodes are
>>>> less
>>>> > > >> please
>>>> > > >> reduce the number of executors per node.*
>>>> > > >>
>>>> > > >> 2.                          Performance refine problem. Row
>>>> number
>>>> > after
>>>> > > >> being filtered is not uniform. Some node maybe heavy. It spend
>>>> more
>>>> > time
>>>> > > >> than other node. The time of one task is 4s ~ 16s. Is any method
>>>> to
>>>> > > refine
>>>> > > >> it?
>>>> > > >>
>>>> > > >> 3.                          Too long time for first and second
>>>> query.
>>>> > I
>>>> > > >> know dictionary and some index need to be loaded for the first
>>>> time.
>>>> > But
>>>> > > >> after I trying use query below to preheat it, it still spend a
>>>> lot of
>>>> > > >> time.
>>>> > > >> How could I preheat the query correctly?
>>>> > > >>                         select Aarray, a, b, c… from Table1 where
>>>> > Aarray
>>>> > > >> is
>>>> > > >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and h =
>>>> 55
>>>> > > >>
>>>> > > >> *Currently we are working on first time query improvement. For
>>>> now you
>>>> > > can
>>>> > > >> run select count(*) or count(column), so all the blocks get
>>>> loaded and
>>>> > > >> then
>>>> > > >> you can run the actual query.*
>>>> > > >>
>>>> > > >>
>>>> > > >> 4.             Any other suggestion to lessen the query time?
>>>> > > >>
>>>> > > >>
>>>> > > >> Some suggestion:
>>>> > > >>             The log by class QueryStatisticsRecorder give me a
>>>> good
>>>> > > means
>>>> > > >> to find the neck bottle, but not enough. There still some metric
>>>> I
>>>> > think
>>>> > > >> is
>>>> > > >> very useful:
>>>> > > >>             1. filter ratio. i.e.. not only result_size but also
>>>> the
>>>> > > >> origin
>>>> > > >> size so we could know how many data is filtered.
>>>> > > >>             2. IO time. The scan_blocks_time is not enough. If
>>>> it is
>>>> > > high,
>>>> > > >> we know somethings wrong, but not know what cause that problem.
>>>> The
>>>> > real
>>>> > > >> IO
>>>> > > >> time for data is not be provided. As there may be several file
>>>> for one
>>>> > > >> partition, know the program slow is caused by datanode or
>>>> executor
>>>> > > itself
>>>> > > >> give us intuition to find the problem.
>>>> > > >>             3. The TableBlockInfo for task. I log it by myself
>>>> when
>>>> > > >> debugging. It tell me how many blocklets is locality. The spark
>>>> web
>>>> > > >> monitor
>>>> > > >> just give a locality level, but may be only one blocklet is
>>>> locality.
>>>> > > >>
>>>> > > >>
>>>> > > >> -Regards
>>>> > > >> Kumar Vishal
>>>> > > >>
>>>> > > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]>
>>>> wrote:
>>>> > > >>
>>>> > > >> > Hi,
>>>> > > >> >
>>>> > > >> > We are using carbondata to build our table and running query in
>>>> > > >> > CarbonContext. We have some performance problem during
>>>> refining the
>>>> > > >> system.
>>>> > > >> >
>>>> > > >> > *Background*:
>>>> > > >> >
>>>> > > >> > *cluster*:                      100 executor,5 task/executor,
>>>> 10G
>>>> > > >> > memory/executor
>>>> > > >> >
>>>> > > >> > *data*:                              60+GB(per one replica) as
>>>> > carbon
>>>> > > >> data
>>>> > > >> > format, 600+MB/file * 100 file, 300+columns, 300+million rows
>>>> > > >> >
>>>> > > >> > *sql example:*
>>>> > > >> >
>>>> > > >> >                                       select A,
>>>> > > >> >
>>>> > > >> >                                       sum(a),
>>>> > > >> >
>>>> > > >> >                                       sum(b),
>>>> > > >> >
>>>> > > >> >                                       sum(c),
>>>> > > >> >
>>>> > > >> >                                       …( extra 100 aggregation
>>>> like
>>>> > > >> > sum(column))
>>>> > > >> >
>>>> > > >> >                                       from Table1 LATERAL VIEW
>>>> > > >> > explode(split(Aarray, ‘*;*’)) ATable AS A
>>>> > > >> >
>>>> > > >> >                                       where A is not null and
>>>> d >
>>>> > > >> “ab:c-10”
>>>> > > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY
>>>> A
>>>> > > >> >
>>>> > > >> > *target query time*:           <10s
>>>> > > >> >
>>>> > > >> > *current query time*:         15s ~ 25s
>>>> > > >> >
>>>> > > >> > *scene:*                             OLAP system. <100 queries
>>>> every
>>>> > > >> day.
>>>> > > >> > Concurrency number is not high. Most time cpu is idle, so this
>>>> > service
>>>> > > >> will
>>>> > > >> > run with other program. The service will run for long time. We
>>>> could
>>>> > > not
>>>> > > >> > occupy a very large memory for every executor.
>>>> > > >> >
>>>> > > >> > *refine*:                              I have build index and
>>>> > > >> dictionary on
>>>> > > >> > d, e, f, g, h and build dictionary on all other aggregation
>>>> > > >> columns(i.e. a,
>>>> > > >> > b, c, …100+ columns). And make sure there is one segment for
>>>> total
>>>> > > >> data. I
>>>> > > >> > have open the speculation(quantile=0.5, interval=250,
>>>> > multiplier=1.2).
>>>> > > >> >
>>>> > > >> > Time is mainly spent on first stage before shuffling. As 95%
>>>> data
>>>> > will
>>>> > > >> be
>>>> > > >> > filtered out, the shuffle process spend little time. In first
>>>> stage,
>>>> > > >> most
>>>> > > >> > task complete in less than 10s. But there still be near 50
>>>> tasks
>>>> > > longer
>>>> > > >> > than 10s. Max task time in one query may be 12~16s.
>>>> > > >> >
>>>> > > >> > *Problem:*
>>>> > > >> >
>>>> > > >> > 1.          GC problem. We suffer a 20%~30% GC time for some
>>>> task in
>>>> > > >> first
>>>> > > >> > stage after a lot of parameter refinement. We now use G1 GC in
>>>> > java8.
>>>> > > GC
>>>> > > >> > time will double if use CMS. The main GC time is spent on young
>>>> > > >> generation
>>>> > > >> > GC. Almost half memory of young generation will be copy to old
>>>> > > >> generation.
>>>> > > >> > It seems lots of object has a long life than GC period and the
>>>> space
>>>> > > is
>>>> > > >> not
>>>> > > >> > be reuse(as concurrent GC will release it later). When we use a
>>>> > large
>>>> > > >> > Eden(>=1G for example), once GC time will be seconds. If set
>>>> Eden
>>>> > > >> > little(256M for example), once GC time will be hundreds
>>>> > milliseconds,
>>>> > > >> but
>>>> > > >> > more frequency and total is still seconds. Is there any way to
>>>> > lessen
>>>> > > >> the
>>>> > > >> > GC time? (We don’t consider the first query and second query
>>>> in this
>>>> > > >> case.)
>>>> > > >> >
>>>> > > >> > 2.           Performance refine problem. Row number after being
>>>> > > >> filtered is
>>>> > > >> > not uniform. Some node maybe heavy. It spend more time than
>>>> other
>>>> > > node.
>>>> > > >> The
>>>> > > >> > time of one task is 4s ~ 16s. Is any method to refine it?
>>>> > > >> >
>>>> > > >> > 3.           Too long time for first and second query. I know
>>>> > > dictionary
>>>> > > >> > and some index need to be loaded for the first time. But after
>>>> I
>>>> > > trying
>>>> > > >> use
>>>> > > >> > query below to preheat it, it still spend a lot of time. How
>>>> could I
>>>> > > >> > preheat the query correctly?
>>>> > > >> >
>>>> > > >> >               select Aarray, a, b, c… from Table1 where Aarray
>>>> is
>>>> > not
>>>> > > >> null
>>>> > > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
>>>> > > >> >
>>>> > > >> > 4.           Any other suggestion to lessen the query time?
>>>> > > >> >
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Some suggestion:
>>>> > > >> >
>>>> > > >> >             The log by class QueryStatisticsRecorder give me a
>>>> good
>>>> > > >> means
>>>> > > >> > to find the neck bottle, but not enough. There still some
>>>> metric I
>>>> > > >> think is
>>>> > > >> > very useful:
>>>> > > >> >
>>>> > > >> >             1. filter ratio. i.e.. not only result_size but
>>>> also the
>>>> > > >> origin
>>>> > > >> > size so we could know how many data is filtered.
>>>> > > >> >
>>>> > > >> >             2. IO time. The scan_blocks_time is not enough. If
>>>> it is
>>>> > > >> high,
>>>> > > >> > we know somethings wrong, but not know what cause that
>>>> problem. The
>>>> > > >> real IO
>>>> > > >> > time for data is not be provided. As there may be several file
>>>> for
>>>> > one
>>>> > > >> > partition, know the program slow is caused by datanode or
>>>> executor
>>>> > > >> itself
>>>> > > >> > give us intuition to find the problem.
>>>> > > >> >
>>>> > > >> >             3. The TableBlockInfo for task. I log it by myself
>>>> when
>>>> > > >> > debugging. It tell me how many blocklets is locality. The
>>>> spark web
>>>> > > >> monitor
>>>> > > >> > just give a locality level, but may be only one blocklet is
>>>> > locality.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > ---------------------
>>>> > > >> >
>>>> > > >> > Anning Luo
>>>> > > >> >
>>>> > > >> > *HULU*
>>>> > > >> >
>>>> > > >> > Email: [hidden email]
>>>> > > >> >
>>>> > > >> >             [hidden email]
>>>> > > >> >
>>>> > > >>
>>>> > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>
kumar vishal