Apache CarbonData Dev Mailing List archive

Re: GC problem and performance refine problem

Posted by Anning Luo-2 on Nov 16, 2016; 8:36am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GC-problem-and-performance-refine-problem-tp2844p2931.html

Hi Kumar Vishal,

Thanks for your suggestion.
The driver log not contain block distribution log by default. How could I
open it?
And how does the order of the dimensions be decided?

2016-11-16 15:14 GMT+08:00 Kumar Vishal <[hidden email]>:

> Hi An Lan,
> Data is already distributed, in this case may be one
> blocklet is returning more number of rows and other returning less because
> of this some task will take more time.
>
> In driver log block distribution log is not present, so it is not clear
> whether it is going for block distribution or blocklet distribution.
> distribution, can you please add the detail driver log.
>
> *Some suggestion:*
>
> 1. If column is String and you are not applying filter then in create
> statement add it in dictionary exclude this will avoid lookup in
> dictionary. If number of String column are less then better add it in
> dictionary exclude.
>
> 2. If column is a numeric column and you are not applying filter then no
> need to add in dictionary include or exclude, so it will be a measure
> column.
>
> 3. Currently in carbon for measure column if you will create as a double
> data type it will give more compression as currently value compression mode
> is support for double data type.
>
> -Regards
> Kumar Vishal
>
>
> On Wed, Nov 16, 2016 at 8:12 AM, An Lan <[hidden email]> wrote:
>
> > Hi Kumar Vishal,
> >
> > 1. I found the quantity of rows filtered out by invert index is not
> > uniform between different tasks and the difference is large. Some task
> may
> > be 3~4k row after filtered, but the longer tasks may be 3~4w. When most
> > longer task on same node, time cost will be more longer than others. So,
> is
> > there any way to balance the data and make rows distribute uniform
> between
> > task?
> > Now, the blocklet size is still 120k rows. 3k+ blocket in total. Every
> > task has 6 blocket.
> > 2. I upload some logs last time. Is there some access problem about it?
> Is
> > there any suggest or question about it?
> >
> > 2016-11-14 11:29 GMT+08:00 An Lan <[hidden email]>:
> >
> >> Hi Kumar Vishal,
> >>
> >>
> >> Driver and some executor logs are in the accessory. The same query run
> >> for five times.
> >>
> >>
> >>
> >> Time consume for every query:
> >>
> >> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
> >>
> >>
> >> The first stage for every query:
> >>
> >> (first query -> start from 0 stage
> >>
> >> second query -> 4
> >>
> >> third query -> 8
> >>
> >> fourth query -> 12
> >>
> >> five query -> 16)
> >>
> >> see "first stage of queries.png" in accessory
> >>
> >>
> >>
> >> Longest task for every query:
> >>
> >> (Task and log relationship:
> >>
> >> stage 0, task 249 -> 1.log
> >>
> >> stage 4, task 420 -> 2.log
> >>
> >> stage 8, task 195 -> 3.log
> >>
> >> stage 12, task 350 -> 4.log
> >>
> >> stage 16, task 321 -> 5.log)
> >> see "longest task.png" in accessory
> >>
> >>
> >> longest tasks.png
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7dGY4cF9jRDk1QTg/
> view?usp=drive_web>
> >>
> >> first stage of queries.png
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7SVZsNnR2VEw2X0k/
> view?usp=drive_web>
> >>
> >> dirverlog
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7VllnVkNnenhyTTA/
> view?usp=drive_web>
> >>
> >> 1.log
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7eUdUWmlia1J4bk0/
> view?usp=drive_web>
> >>
> >> 2.log
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ZUlGZHZVQ3phdmM/
> view?usp=drive_web>
> >>
> >> 3.log
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7UHpBQzREX3N5aEk/
> view?usp=drive_web>
> >>
> >> 4.log
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7NVctMmYwNldCVEk/
> view?usp=drive_web>
> >>
> >> 5.log
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ODhHNE5sSGNfOVE/
> view?usp=drive_web>
> >>
> >>
> >> 2016-11-14 11:22 GMT+08:00 An Lan <[hidden email]>:
> >>
> >>> Hi,
> >>>
> >>>
> >>>
> >>> Driver and some executor logs are in the accessory. The same query run
> >>> for five times.
> >>>
> >>>
> >>>
> >>> Time consume for every query:
> >>>
> >>> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
> >>>
> >>>
> >>>
> >>> The first stage for every query:
> >>>
> >>> (first query -> start from 0 stage
> >>>
> >>> second query -> 4
> >>>
> >>> third query -> 8
> >>>
> >>> fourth query -> 12
> >>>
> >>> five query -> 16)
> >>>
> >>>
> >>> Longest task for every query:
> >>>
> >>> (Task and log relationship:
> >>>
> >>> stage 0, task 249 -> 1.log
> >>>
> >>> stage 4, task 420 -> 2.log
> >>>
> >>> stage 8, task 195 -> 3.log
> >>>
> >>> stage 12, task 350 -> 4.log
> >>>
> >>> stage 16, task 321 -> 5.log)
> >>>
> >>>
> >>>
> >>> 2016-11-11 20:25 GMT+08:00 Kumar Vishal <[hidden email]>:
> >>>
> >>>> What is the time difference between first time and second time query
> as
> >>>> second time it will read from os cache,so i think there wont be any IO
> >>>> bottleneck.
> >>>>
> >>>> Can u provide driver log and executor log for task which are taking
> more
> >>>> time.
> >>>>
> >>>>
> >>>> -Regards
> >>>> Kumar Vishal
> >>>>
> >>>> On Fri, Nov 11, 2016 at 3:21 PM, An Lan <[hidden email]> wrote:
> >>>>
> >>>> > 1. I have set --num-executors=100 in all test. The image write 100
> >>>> nodes
> >>>> > means one executor for one node, and when there is 64 node, there
> >>>> maybe two
> >>>> > or three executor on one node. The total executor number is always
> >>>> 100.
> >>>> >
> >>>> > 2. I do not find the property enable.blocklet.distribution in
> 0.1.1. I
> >>>> > found it in 0.2.0. I am testing on 0.1.1, and will try it later. If
> >>>> you
> >>>> > concern locality level, it seems that the most long query is not
> >>>> caused by
> >>>> > IO problem. Statistic of the long task statistic like this:
> >>>> >
> >>>> > +--------------------+----------------+--------------------+
> >>>> > ----------------+---------------+-----------+-------------------+
> >>>> > | task_id|load_blocks_time|load
> >>>> _dictionary_time|scan_blocks_
> >>>> > time|scan_blocks_num|result_size|total_executor_time|
> >>>> > +--------------------+----------------+--------------------+
> >>>> > ----------------+---------------+-----------+-------------------+
> >>>> > |4199754456147478_134| 1 | 32 |
> >>>> > 3306 | 3 | 42104 | 13029 |
> >>>> > +--------------------+----------------+--------------------+
> >>>> > ----------------+---------------+-----------+-------------------+
> >>>> >
> >>>> > I have suffer from IO, but after configure speculation, most tasks
> >>>> > with a long IO will be resend.
> >>>> >
> >>>> >
> >>>> > 3. distinct value:
> >>>> >
> >>>> > g: 11
> >>>> >
> >>>> > h: 3
> >>>> >
> >>>> > f: 7
> >>>> >
> >>>> > e: 3
> >>>> >
> >>>> > d: 281
> >>>> >
> >>>> > 4. In the query, only e, f ,g, h, d and A are used in filter, others
> >>>> are
> >>>> > not used. So I think others used in the aggregation are no need
> added
> >>>> for
> >>>> > index.
> >>>> >
> >>>> > 5. I have write the e, f, g, h, d in most left just after "create
> >>>> table..."
> >>>> >
> >>>> > 2016-11-11 15:08 GMT+08:00 Kumar Vishal <[hidden email]
> >:
> >>>> >
> >>>> > > Hi An Lan,
> >>>> > >
> >>>> > > Please confirm below things.
> >>>> > >
> >>>> > > 1. Is dynamic executor is enabled?? If it is enabled can u
> disabled
> >>>> and
> >>>> > > try(this is to check is there any impact with dynamic executor)
> >>>> > > for disabling dynamic executor you need set in
> >>>> spark-default.conf
> >>>> > > --num-executors=100
> >>>> > >
> >>>> > > 2. Can u please set in below property
> enable.blocklet.distribution=f
> >>>> alse
> >>>> > > and execute the query.
> >>>> > >
> >>>> > > 3. Cardinality of each column.
> >>>> > >
> >>>> > > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??
> >>>> > >
> >>>> > > 5. Can u keep all the columns which is present in filter on left
> >>>> side so
> >>>> > > less no of blocks will identified during block pruning and it will
> >>>> > improve
> >>>> > > the query performance.
> >>>> > >
> >>>> > >
> >>>> > > -Regards
> >>>> > > Kumar Vishal
> >>>> > >
> >>>> > > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <[hidden email]>
> >>>> wrote:
> >>>> > >
> >>>> > > > Hi Kumar Vishal,
> >>>> > > >
> >>>> > > > 1. Create table ddl:
> >>>> > > >
> >>>> > > > CREATE TABLE IF NOT EXISTS Table1
> >>>> > > >
> >>>> > > > (* h Int, g Int, d String, f Int, e Int,*
> >>>> > > >
> >>>> > > > a Int, b Int, …(extra near 300 columns)
> >>>> > > >
> >>>> > > > STORED BY 'org.apache.carbondata.format'
> >>>> > > > TBLPROPERTIES(
> >>>> > > >
> >>>> > > > "NO_INVERTED_INDEX”=“a”,
> >>>> > > > "NO_INVERTED_INDEX”=“b”,
> >>>> > > >
> >>>> > > > …(extra near 300 columns)
> >>>> > > >
> >>>> > > > "DICTIONARY_INCLUDE”=“a”,
> >>>> > > >
> >>>> > > > "DICTIONARY_INCLUDE”=“b”,
> >>>> > > >
> >>>> > > > …(extra near 300 columns)
> >>>> > > >
> >>>> > > > )
> >>>> > > >
> >>>> > > > 2. 3. There more than hundreds node in the cluster, but
> >>>> cluster
> >>>> > is
> >>>> > > > used mixed with other application. Some time when node is
> >>>> enough, we
> >>>> > > will
> >>>> > > > get 100 distinct node.
> >>>> > > >
> >>>> > > > 4. I give a statistic of task time during once query and
> mark
> >>>> > > > distinct nodes below:
> >>>> > > >
> >>>> > > > [image: 内嵌图片 1]
> >>>> > > >
> >>>> > > >
> >>>> > > >
> >>>> > > >
> >>>> > > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <
> >>>> [hidden email]>:
> >>>> > > >
> >>>> > > >> Hi Anning Luo,
> >>>> > > >>
> >>>> > > >> Can u please provide below details.
> >>>> > > >>
> >>>> > > >> 1.Create table ddl.
> >>>> > > >> 2.Number of node in you cluster setup.
> >>>> > > >> 3. Number of executors per node.
> >>>> > > >> 4. Query statistics.
> >>>> > > >>
> >>>> > > >> Please find my comments in bold.
> >>>> > > >>
> >>>> > > >> Problem:
> >>>> > > >> 1. GC problem. We suffer a 20%~30% GC
> >>>> time
> >>>> > for
> >>>> > > >> some task in first stage after a lot of parameter refinement.
> We
> >>>> now
> >>>> > use
> >>>> > > >> G1
> >>>> > > >> GC in java8. GC time will double if use CMS. The main GC time
> is
> >>>> spent
> >>>> > > on
> >>>> > > >> young generation GC. Almost half memory of young generation
> will
> >>>> be
> >>>> > copy
> >>>> > > >> to
> >>>> > > >> old generation. It seems lots of object has a long life than GC
> >>>> period
> >>>> > > and
> >>>> > > >> the space is not be reuse(as concurrent GC will release it
> >>>> later).
> >>>> > When
> >>>> > > we
> >>>> > > >> use a large Eden(>=1G for example), once GC time will be
> >>>> seconds. If
> >>>> > set
> >>>> > > >> Eden little(256M for example), once GC time will be hundreds
> >>>> > > milliseconds,
> >>>> > > >> but more frequency and total is still seconds. Is there any way
> >>>> to
> >>>> > > lessen
> >>>> > > >> the GC time? (We don’t consider the first query and second
> query
> >>>> in
> >>>> > this
> >>>> > > >> case.)
> >>>> > > >>
> >>>> > > >> *How many node are present in your cluster setup?? If nodes are
> >>>> less
> >>>> > > >> please
> >>>> > > >> reduce the number of executors per node.*
> >>>> > > >>
> >>>> > > >> 2. Performance refine problem. Row
> >>>> number
> >>>> > after
> >>>> > > >> being filtered is not uniform. Some node maybe heavy. It spend
> >>>> more
> >>>> > time
> >>>> > > >> than other node. The time of one task is 4s ~ 16s. Is any
> method
> >>>> to
> >>>> > > refine
> >>>> > > >> it?
> >>>> > > >>
> >>>> > > >> 3. Too long time for first and second
> >>>> query.
> >>>> > I
> >>>> > > >> know dictionary and some index need to be loaded for the first
> >>>> time.
> >>>> > But
> >>>> > > >> after I trying use query below to preheat it, it still spend a
> >>>> lot of
> >>>> > > >> time.
> >>>> > > >> How could I preheat the query correctly?
> >>>> > > >> select Aarray, a, b, c… from Table1
> where
> >>>> > Aarray
> >>>> > > >> is
> >>>> > > >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and h =
> >>>> 55
> >>>> > > >>
> >>>> > > >> *Currently we are working on first time query improvement. For
> >>>> now you
> >>>> > > can
> >>>> > > >> run select count(*) or count(column), so all the blocks get
> >>>> loaded and
> >>>> > > >> then
> >>>> > > >> you can run the actual query.*
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> 4. Any other suggestion to lessen the query time?
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> Some suggestion：
> >>>> > > >> The log by class QueryStatisticsRecorder give me a
> >>>> good
> >>>> > > means
> >>>> > > >> to find the neck bottle, but not enough. There still some
> metric
> >>>> I
> >>>> > think
> >>>> > > >> is
> >>>> > > >> very useful:
> >>>> > > >> 1. filter ratio. i.e.. not only result_size but
> also
> >>>> the
> >>>> > > >> origin
> >>>> > > >> size so we could know how many data is filtered.
> >>>> > > >> 2. IO time. The scan_blocks_time is not enough. If
> >>>> it is
> >>>> > > high,
> >>>> > > >> we know somethings wrong, but not know what cause that problem.
> >>>> The
> >>>> > real
> >>>> > > >> IO
> >>>> > > >> time for data is not be provided. As there may be several file
> >>>> for one
> >>>> > > >> partition, know the program slow is caused by datanode or
> >>>> executor
> >>>> > > itself
> >>>> > > >> give us intuition to find the problem.
> >>>> > > >> 3. The TableBlockInfo for task. I log it by myself
> >>>> when
> >>>> > > >> debugging. It tell me how many blocklets is locality. The spark
> >>>> web
> >>>> > > >> monitor
> >>>> > > >> just give a locality level, but may be only one blocklet is
> >>>> locality.
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> -Regards
> >>>> > > >> Kumar Vishal
> >>>> > > >>
> >>>> > > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]>
> >>>> wrote:
> >>>> > > >>
> >>>> > > >> > Hi,
> >>>> > > >> >
> >>>> > > >> > We are using carbondata to build our table and running query
> in
> >>>> > > >> > CarbonContext. We have some performance problem during
> >>>> refining the
> >>>> > > >> system.
> >>>> > > >> >
> >>>> > > >> > *Background*：
> >>>> > > >> >
> >>>> > > >> > *cluster*： 100 executor，5 task/executor,
> >>>> 10G
> >>>> > > >> > memory/executor
> >>>> > > >> >
> >>>> > > >> > *data*: 60+GB(per one replica)
> as
> >>>> > carbon
> >>>> > > >> data
> >>>> > > >> > format, 600+MB/file * 100 file, 300+columns, 300+million rows
> >>>> > > >> >
> >>>> > > >> > *sql example:*
> >>>> > > >> >
> >>>> > > >> > select A,
> >>>> > > >> >
> >>>> > > >> > sum(a),
> >>>> > > >> >
> >>>> > > >> > sum(b),
> >>>> > > >> >
> >>>> > > >> > sum(c),
> >>>> > > >> >
> >>>> > > >> > …( extra 100
> aggregation
> >>>> like
> >>>> > > >> > sum(column))
> >>>> > > >> >
> >>>> > > >> > from Table1 LATERAL
> VIEW
> >>>> > > >> > explode(split(Aarray, ‘*;*’)) ATable AS A
> >>>> > > >> >
> >>>> > > >> > where A is not null and
> >>>> d >
> >>>> > > >> “ab:c-10”
> >>>> > > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP
> BY
> >>>> A
> >>>> > > >> >
> >>>> > > >> > *target query time*: <10s
> >>>> > > >> >
> >>>> > > >> > *current query time*: 15s ~ 25s
> >>>> > > >> >
> >>>> > > >> > *scene:* OLAP system. <100
> queries
> >>>> every
> >>>> > > >> day.
> >>>> > > >> > Concurrency number is not high. Most time cpu is idle, so
> this
> >>>> > service
> >>>> > > >> will
> >>>> > > >> > run with other program. The service will run for long time.
> We
> >>>> could
> >>>> > > not
> >>>> > > >> > occupy a very large memory for every executor.
> >>>> > > >> >
> >>>> > > >> > *refine*: I have build index and
> >>>> > > >> dictionary on
> >>>> > > >> > d, e, f, g, h and build dictionary on all other aggregation
> >>>> > > >> columns(i.e. a,
> >>>> > > >> > b, c, …100+ columns). And make sure there is one segment for
> >>>> total
> >>>> > > >> data. I
> >>>> > > >> > have open the speculation(quantile=0.5, interval=250,
> >>>> > multiplier=1.2).
> >>>> > > >> >
> >>>> > > >> > Time is mainly spent on first stage before shuffling. As 95%
> >>>> data
> >>>> > will
> >>>> > > >> be
> >>>> > > >> > filtered out, the shuffle process spend little time. In first
> >>>> stage,
> >>>> > > >> most
> >>>> > > >> > task complete in less than 10s. But there still be near 50
> >>>> tasks
> >>>> > > longer
> >>>> > > >> > than 10s. Max task time in one query may be 12~16s.
> >>>> > > >> >
> >>>> > > >> > *Problem:*
> >>>> > > >> >
> >>>> > > >> > 1. GC problem. We suffer a 20%~30% GC time for some
> >>>> task in
> >>>> > > >> first
> >>>> > > >> > stage after a lot of parameter refinement. We now use G1 GC
> in
> >>>> > java8.
> >>>> > > GC
> >>>> > > >> > time will double if use CMS. The main GC time is spent on
> young
> >>>> > > >> generation
> >>>> > > >> > GC. Almost half memory of young generation will be copy to
> old
> >>>> > > >> generation.
> >>>> > > >> > It seems lots of object has a long life than GC period and
> the
> >>>> space
> >>>> > > is
> >>>> > > >> not
> >>>> > > >> > be reuse(as concurrent GC will release it later). When we
> use a
> >>>> > large
> >>>> > > >> > Eden(>=1G for example), once GC time will be seconds. If set
> >>>> Eden
> >>>> > > >> > little(256M for example), once GC time will be hundreds
> >>>> > milliseconds,
> >>>> > > >> but
> >>>> > > >> > more frequency and total is still seconds. Is there any way
> to
> >>>> > lessen
> >>>> > > >> the
> >>>> > > >> > GC time? (We don’t consider the first query and second query
> >>>> in this
> >>>> > > >> case.)
> >>>> > > >> >
> >>>> > > >> > 2. Performance refine problem. Row number after
> being
> >>>> > > >> filtered is
> >>>> > > >> > not uniform. Some node maybe heavy. It spend more time than
> >>>> other
> >>>> > > node.
> >>>> > > >> The
> >>>> > > >> > time of one task is 4s ~ 16s. Is any method to refine it?
> >>>> > > >> >
> >>>> > > >> > 3. Too long time for first and second query. I know
> >>>> > > dictionary
> >>>> > > >> > and some index need to be loaded for the first time. But
> after
> >>>> I
> >>>> > > trying
> >>>> > > >> use
> >>>> > > >> > query below to preheat it, it still spend a lot of time. How
> >>>> could I
> >>>> > > >> > preheat the query correctly?
> >>>> > > >> >
> >>>> > > >> > select Aarray, a, b, c… from Table1 where
> Aarray
> >>>> is
> >>>> > not
> >>>> > > >> null
> >>>> > > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> >>>> > > >> >
> >>>> > > >> > 4. Any other suggestion to lessen the query time?
> >>>> > > >> >
> >>>> > > >> >
> >>>> > > >> >
> >>>> > > >> > Some suggestion：
> >>>> > > >> >
> >>>> > > >> > The log by class QueryStatisticsRecorder give me
> a
> >>>> good
> >>>> > > >> means
> >>>> > > >> > to find the neck bottle, but not enough. There still some
> >>>> metric I
> >>>> > > >> think is
> >>>> > > >> > very useful:
> >>>> > > >> >
> >>>> > > >> > 1. filter ratio. i.e.. not only result_size but
> >>>> also the
> >>>> > > >> origin
> >>>> > > >> > size so we could know how many data is filtered.
> >>>> > > >> >
> >>>> > > >> > 2. IO time. The scan_blocks_time is not enough.
> If
> >>>> it is
> >>>> > > >> high,
> >>>> > > >> > we know somethings wrong, but not know what cause that
> >>>> problem. The
> >>>> > > >> real IO
> >>>> > > >> > time for data is not be provided. As there may be several
> file
> >>>> for
> >>>> > one
> >>>> > > >> > partition, know the program slow is caused by datanode or
> >>>> executor
> >>>> > > >> itself
> >>>> > > >> > give us intuition to find the problem.
> >>>> > > >> >
> >>>> > > >> > 3. The TableBlockInfo for task. I log it by
> myself
> >>>> when
> >>>> > > >> > debugging. It tell me how many blocklets is locality. The
> >>>> spark web
> >>>> > > >> monitor
> >>>> > > >> > just give a locality level, but may be only one blocklet is
> >>>> > locality.
> >>>> > > >> >
> >>>> > > >> >
> >>>> > > >> > ---------------------
> >>>> > > >> >
> >>>> > > >> > Anning Luo
> >>>> > > >> >
> >>>> > > >> > *HULU*
> >>>> > > >> >
> >>>> > > >> > Email: [hidden email]
> >>>> > > >> >
> >>>> > > >> > [hidden email]
> >>>> > > >> >
> >>>> > > >>
> >>>> > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> >>>
> >>>
> >>
> >
>