Hi Kumar Vishal,
Driver and some executor logs are in the accessory. The same query run for five times.
Time consume for every query:
67068ms, 45758ms, 26497ms, 22619ms, 21504ms
The first stage for every query:
(first query -> start from 0 stage
second query -> 4
third query -> 8
fourth query -> 12
five query -> 16)
see "first stage of queries.png" in accessory
Longest task for every query:
(Task and log relationship:
stage 0, task 249 -> 1.log
stage 4, task 420 -> 2.log
stage 8, task 195 -> 3.log
stage 12, task 350 -> 4.log
stage 16, task 321 -> 5.log)
see "longest task.png" in accessory2016-11-14 11:22 GMT+08:00 An Lan <[hidden email]>:Hi,
Driver and some executor logs are in the accessory. The same query run for five times.
Time consume for every query:
67068ms, 45758ms, 26497ms, 22619ms, 21504ms
The first stage for every query:
(first query -> start from 0 stage
second query -> 4
third query -> 8
fourth query -> 12
five query -> 16)
Longest task for every query:
(Task and log relationship:
stage 0, task 249 -> 1.log
stage 4, task 420 -> 2.log
stage 8, task 195 -> 3.log
stage 12, task 350 -> 4.log
stage 16, task 321 -> 5.log)
2016-11-11 20:25 GMT+08:00 Kumar Vishal <[hidden email]>:What is the time difference between first time and second time query as
second time it will read from os cache,so i think there wont be any IO
bottleneck.
Can u provide driver log and executor log for task which are taking more
time.
-Regards
Kumar Vishal
On Fri, Nov 11, 2016 at 3:21 PM, An Lan <[hidden email]> wrote:
> 1. I have set --num-executors=100 in all test. The image write 100 nodes
> means one executor for one node, and when there is 64 node, there maybe two
> or three executor on one node. The total executor number is always 100.
>
> 2. I do not find the property enable.blocklet.distribution in 0.1.1. I
> found it in 0.2.0. I am testing on 0.1.1, and will try it later. If you
> concern locality level, it seems that the most long query is not caused by
> IO problem. Statistic of the long task statistic like this:
>
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+--------------- ----+
> | task_id|load_blocks_time|load_dictionary_time|scan_blocks_
> time|scan_blocks_num|result_size|total_executor_time|
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+--------------- ----+
> |4199754456147478_134| 1 | 32 |
> 3306 | 3 | 42104 | 13029 |
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+--------------- ----+
>
> I have suffer from IO, but after configure speculation, most tasks
> with a long IO will be resend.
>
>
> 3. distinct value:
>
> g: 11
>
> h: 3
>
> f: 7
>
> e: 3
>
> d: 281
>
> 4. In the query, only e, f ,g, h, d and A are used in filter, others are
> not used. So I think others used in the aggregation are no need added for
> index.
>
> 5. I have write the e, f, g, h, d in most left just after "create table..."
>
> 2016-11-11 15:08 GMT+08:00 Kumar Vishal <[hidden email]>:
>
> > Hi An Lan,
> >
> > Please confirm below things.
> >
> > 1. Is dynamic executor is enabled?? If it is enabled can u disabled and
> > try(this is to check is there any impact with dynamic executor)
> > for disabling dynamic executor you need set in spark-default.conf
> > --num-executors=100
> >
> > 2. Can u please set in below property enable.blocklet.distribution=false
> > and execute the query.
> >
> > 3. Cardinality of each column.
> >
> > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??
> >
> > 5. Can u keep all the columns which is present in filter on left side so
> > less no of blocks will identified during block pruning and it will
> improve
> > the query performance.
> >
> >
> > -Regards
> > Kumar Vishal
> >
> > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <[hidden email]> wrote:
> >
> > > Hi Kumar Vishal,
> > >
> > > 1. Create table ddl:
> > >
> > > CREATE TABLE IF NOT EXISTS Table1
> > >
> > > (* h Int, g Int, d String, f Int, e Int,*
> > >
> > > a Int, b Int, …(extra near 300 columns)
> > >
> > > STORED BY 'org.apache.carbondata.format'
> > > TBLPROPERTIES(
> > >
> > > "NO_INVERTED_INDEX”=“a”,
> > > "NO_INVERTED_INDEX”=“b”,
> > >
> > > …(extra near 300 columns)
> > >
> > > "DICTIONARY_INCLUDE”=“a”,
> > >
> > > "DICTIONARY_INCLUDE”=“b”,
> > >
> > > …(extra near 300 columns)
> > >
> > > )
> > >
> > > 2. 3. There more than hundreds node in the cluster, but cluster
> is
> > > used mixed with other application. Some time when node is enough, we
> > will
> > > get 100 distinct node.
> > >
> > > 4. I give a statistic of task time during once query and mark
> > > distinct nodes below:
> > >
> > > [image: 内嵌图片 1]
> > >
> > >
> > >
> > >
> > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <[hidden email]>:
> > >
> > >> Hi Anning Luo,
> > >>
> > >> Can u please provide below details.
> > >>
> > >> 1.Create table ddl.
> > >> 2.Number of node in you cluster setup.
> > >> 3. Number of executors per node.
> > >> 4. Query statistics.
> > >>
> > >> Please find my comments in bold.
> > >>
> > >> Problem:
> > >> 1. GC problem. We suffer a 20%~30% GC time
> for
> > >> some task in first stage after a lot of parameter refinement. We now
> use
> > >> G1
> > >> GC in java8. GC time will double if use CMS. The main GC time is spent
> > on
> > >> young generation GC. Almost half memory of young generation will be
> copy
> > >> to
> > >> old generation. It seems lots of object has a long life than GC period
> > and
> > >> the space is not be reuse(as concurrent GC will release it later).
> When
> > we
> > >> use a large Eden(>=1G for example), once GC time will be seconds. If
> set
> > >> Eden little(256M for example), once GC time will be hundreds
> > milliseconds,
> > >> but more frequency and total is still seconds. Is there any way to
> > lessen
> > >> the GC time? (We don’t consider the first query and second query in
> this
> > >> case.)
> > >>
> > >> *How many node are present in your cluster setup?? If nodes are less
> > >> please
> > >> reduce the number of executors per node.*
> > >>
> > >> 2. Performance refine problem. Row number
> after
> > >> being filtered is not uniform. Some node maybe heavy. It spend more
> time
> > >> than other node. The time of one task is 4s ~ 16s. Is any method to
> > refine
> > >> it?
> > >>
> > >> 3. Too long time for first and second query.
> I
> > >> know dictionary and some index need to be loaded for the first time.
> But
> > >> after I trying use query below to preheat it, it still spend a lot of
> > >> time.
> > >> How could I preheat the query correctly?
> > >> select Aarray, a, b, c… from Table1 where
> Aarray
> > >> is
> > >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> > >>
> > >> *Currently we are working on first time query improvement. For now you
> > can
> > >> run select count(*) or count(column), so all the blocks get loaded and
> > >> then
> > >> you can run the actual query.*
> > >>
> > >>
> > >> 4. Any other suggestion to lessen the query time?
> > >>
> > >>
> > >> Some suggestion:
> > >> The log by class QueryStatisticsRecorder give me a good
> > means
> > >> to find the neck bottle, but not enough. There still some metric I
> think
> > >> is
> > >> very useful:
> > >> 1. filter ratio. i.e.. not only result_size but also the
> > >> origin
> > >> size so we could know how many data is filtered.
> > >> 2. IO time. The scan_blocks_time is not enough. If it is
> > high,
> > >> we know somethings wrong, but not know what cause that problem. The
> real
> > >> IO
> > >> time for data is not be provided. As there may be several file for one
> > >> partition, know the program slow is caused by datanode or executor
> > itself
> > >> give us intuition to find the problem.
> > >> 3. The TableBlockInfo for task. I log it by myself when
> > >> debugging. It tell me how many blocklets is locality. The spark web
> > >> monitor
> > >> just give a locality level, but may be only one blocklet is locality.
> > >>
> > >>
> > >> -Regards
> > >> Kumar Vishal
> > >>
> > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > We are using carbondata to build our table and running query in
> > >> > CarbonContext. We have some performance problem during refining the
> > >> system.
> > >> >
> > >> > *Background*:
> > >> >
> > >> > *cluster*: 100 executor,5 task/executor, 10G
> > >> > memory/executor
> > >> >
> > >> > *data*: 60+GB(per one replica) as
> carbon
> > >> data
> > >> > format, 600+MB/file * 100 file, 300+columns, 300+million rows
> > >> >
> > >> > *sql example:*
> > >> >
> > >> > select A,
> > >> >
> > >> > sum(a),
> > >> >
> > >> > sum(b),
> > >> >
> > >> > sum(c),
> > >> >
> > >> > …( extra 100 aggregation like
> > >> > sum(column))
> > >> >
> > >> > from Table1 LATERAL VIEW
> > >> > explode(split(Aarray, ‘*;*’)) ATable AS A
> > >> >
> > >> > where A is not null and d >
> > >> “ab:c-10”
> > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A
> > >> >
> > >> > *target query time*: <10s
> > >> >
> > >> > *current query time*: 15s ~ 25s
> > >> >
> > >> > *scene:* OLAP system. <100 queries every
> > >> day.
> > >> > Concurrency number is not high. Most time cpu is idle, so this
> service
> > >> will
> > >> > run with other program. The service will run for long time. We could
> > not
> > >> > occupy a very large memory for every executor.
> > >> >
> > >> > *refine*: I have build index and
> > >> dictionary on
> > >> > d, e, f, g, h and build dictionary on all other aggregation
> > >> columns(i.e. a,
> > >> > b, c, …100+ columns). And make sure there is one segment for total
> > >> data. I
> > >> > have open the speculation(quantile=0.5, interval=250,
> multiplier=1.2).
> > >> >
> > >> > Time is mainly spent on first stage before shuffling. As 95% data
> will
> > >> be
> > >> > filtered out, the shuffle process spend little time. In first stage,
> > >> most
> > >> > task complete in less than 10s. But there still be near 50 tasks
> > longer
> > >> > than 10s. Max task time in one query may be 12~16s.
> > >> >
> > >> > *Problem:*
> > >> >
> > >> > 1. GC problem. We suffer a 20%~30% GC time for some task in
> > >> first
> > >> > stage after a lot of parameter refinement. We now use G1 GC in
> java8.
> > GC
> > >> > time will double if use CMS. The main GC time is spent on young
> > >> generation
> > >> > GC. Almost half memory of young generation will be copy to old
> > >> generation.
> > >> > It seems lots of object has a long life than GC period and the space
> > is
> > >> not
> > >> > be reuse(as concurrent GC will release it later). When we use a
> large
> > >> > Eden(>=1G for example), once GC time will be seconds. If set Eden
> > >> > little(256M for example), once GC time will be hundreds
> milliseconds,
> > >> but
> > >> > more frequency and total is still seconds. Is there any way to
> lessen
> > >> the
> > >> > GC time? (We don’t consider the first query and second query in this
> > >> case.)
> > >> >
> > >> > 2. Performance refine problem. Row number after being
> > >> filtered is
> > >> > not uniform. Some node maybe heavy. It spend more time than other
> > node.
> > >> The
> > >> > time of one task is 4s ~ 16s. Is any method to refine it?
> > >> >
> > >> > 3. Too long time for first and second query. I know
> > dictionary
> > >> > and some index need to be loaded for the first time. But after I
> > trying
> > >> use
> > >> > query below to preheat it, it still spend a lot of time. How could I
> > >> > preheat the query correctly?
> > >> >
> > >> > select Aarray, a, b, c… from Table1 where Aarray is
> not
> > >> null
> > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> > >> >
> > >> > 4. Any other suggestion to lessen the query time?
> > >> >
> > >> >
> > >> >
> > >> > Some suggestion:
> > >> >
> > >> > The log by class QueryStatisticsRecorder give me a good
> > >> means
> > >> > to find the neck bottle, but not enough. There still some metric I
> > >> think is
> > >> > very useful:
> > >> >
> > >> > 1. filter ratio. i.e.. not only result_size but also the
> > >> origin
> > >> > size so we could know how many data is filtered.
> > >> >
> > >> > 2. IO time. The scan_blocks_time is not enough. If it is
> > >> high,
> > >> > we know somethings wrong, but not know what cause that problem. The
> > >> real IO
> > >> > time for data is not be provided. As there may be several file for
> one
> > >> > partition, know the program slow is caused by datanode or executor
> > >> itself
> > >> > give us intuition to find the problem.
> > >> >
> > >> > 3. The TableBlockInfo for task. I log it by myself when
> > >> > debugging. It tell me how many blocklets is locality. The spark web
> > >> monitor
> > >> > just give a locality level, but may be only one blocklet is
> locality.
> > >> >
> > >> >
> > >> > ---------------------
> > >> >
> > >> > Anning Luo
> > >> >
> > >> > *HULU*
> > >> >
> > >> > Email: [hidden email]
> > >> >
> > >> > [hidden email]
> > >> >
> > >>
> > >
> > >
> >
>
Free forum by Nabble | Edit this page |