Apache CarbonData Dev Mailing List archive

GC problem and performance refine problem

Classic

List

Threaded

15 messages Options

Anning Luo-2

Nov 10, 2016; 3:25pm

GC problem and performance refine problem

14 posts

Hi,

We are using carbondata to build our table and running query in
CarbonContext. We have some performance problem during refining the system.

*Background*：

*cluster*： 100 executor，5 task/executor, 10G
memory/executor

*data*: 60+GB(per one replica) as carbon data
format, 600+MB/file * 100 file, 300+columns, 300+million rows

*sql example:*

select A,

sum(a),

sum(b),

sum(c),

…( extra 100 aggregation like
sum(column))

from Table1 LATERAL VIEW
explode(split(Aarray, ‘*;*’)) ATable AS A

where A is not null and d > “ab:c-10”
and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A

*target query time*: <10s

*current query time*: 15s ~ 25s

*scene:* OLAP system. <100 queries every day.
Concurrency number is not high. Most time cpu is idle, so this service will
run with other program. The service will run for long time. We could not
occupy a very large memory for every executor.

*refine*: I have build index and dictionary on
d, e, f, g, h and build dictionary on all other aggregation columns(i.e. a,
b, c, …100+ columns). And make sure there is one segment for total data. I
have open the speculation(quantile=0.5, interval=250, multiplier=1.2).

Time is mainly spent on first stage before shuffling. As 95% data will be
filtered out, the shuffle process spend little time. In first stage, most
task complete in less than 10s. But there still be near 50 tasks longer
than 10s. Max task time in one query may be 12~16s.

*Problem:*

1. GC problem. We suffer a 20%~30% GC time for some task in first
stage after a lot of parameter refinement. We now use G1 GC in java8. GC
time will double if use CMS. The main GC time is spent on young generation
GC. Almost half memory of young generation will be copy to old generation.
It seems lots of object has a long life than GC period and the space is not
be reuse(as concurrent GC will release it later). When we use a large
Eden(>=1G for example), once GC time will be seconds. If set Eden
little(256M for example), once GC time will be hundreds milliseconds, but
more frequency and total is still seconds. Is there any way to lessen the
GC time? (We don’t consider the first query and second query in this case.)

2. Performance refine problem. Row number after being filtered is
not uniform. Some node maybe heavy. It spend more time than other node. The
time of one task is 4s ~ 16s. Is any method to refine it?

3. Too long time for first and second query. I know dictionary
and some index need to be loaded for the first time. But after I trying use
query below to preheat it, it still spend a lot of time. How could I
preheat the query correctly?

select Aarray, a, b, c… from Table1 where Aarray is not null
and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55

4. Any other suggestion to lessen the query time?

Some suggestion：

The log by class QueryStatisticsRecorder give me a good means
to find the neck bottle, but not enough. There still some metric I think is
very useful:

1. filter ratio. i.e.. not only result_size but also the origin
size so we could know how many data is filtered.

2. IO time. The scan_blocks_time is not enough. If it is high,
we know somethings wrong, but not know what cause that problem. The real IO
time for data is not be provided. As there may be several file for one
partition, know the program slow is caused by datanode or executor itself
give us intuition to find the problem.

3. The TableBlockInfo for task. I log it by myself when
debugging. It tell me how many blocklets is locality. The spark web monitor
just give a locality level, but may be only one blocklet is locality.

---------------------

Anning Luo

*HULU*

Email: [hidden email]

[hidden email]

kumarvishal09

Nov 10, 2016; 3:52pm

Re: GC problem and performance refine problem

142 posts

Hi Anning Luo,

Can u please provide below details.

1.Create table ddl.
2.Number of node in you cluster setup.
3. Number of executors per node.
4. Query statistics.

Please find my comments in bold.

Problem:
1. GC problem. We suffer a 20%~30% GC time for
some task in first stage after a lot of parameter refinement. We now use G1
GC in java8. GC time will double if use CMS. The main GC time is spent on
young generation GC. Almost half memory of young generation will be copy to
old generation. It seems lots of object has a long life than GC period and
the space is not be reuse(as concurrent GC will release it later). When we
use a large Eden(>=1G for example), once GC time will be seconds. If set
Eden little(256M for example), once GC time will be hundreds milliseconds,
but more frequency and total is still seconds. Is there any way to lessen
the GC time? (We don’t consider the first query and second query in this
case.)

*How many node are present in your cluster setup?? If nodes are less please
reduce the number of executors per node.*

2. Performance refine problem. Row number after
being filtered is not uniform. Some node maybe heavy. It spend more time
than other node. The time of one task is 4s ~ 16s. Is any method to refine
it?

3. Too long time for first and second query. I
know dictionary and some index need to be loaded for the first time. But
after I trying use query below to preheat it, it still spend a lot of time.
How could I preheat the query correctly?
select Aarray, a, b, c… from Table1 where Aarray is
not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55

*Currently we are working on first time query improvement. For now you can
run select count(*) or count(column), so all the blocks get loaded and then
you can run the actual query.*

4. Any other suggestion to lessen the query time?

Some suggestion：
The log by class QueryStatisticsRecorder give me a good means
to find the neck bottle, but not enough. There still some metric I think is
very useful:
1. filter ratio. i.e.. not only result_size but also the origin
size so we could know how many data is filtered.
2. IO time. The scan_blocks_time is not enough. If it is high,
we know somethings wrong, but not know what cause that problem. The real IO
time for data is not be provided. As there may be several file for one
partition, know the program slow is caused by datanode or executor itself
give us intuition to find the problem.
3. The TableBlockInfo for task. I log it by myself when
debugging. It tell me how many blocklets is locality. The spark web monitor
just give a locality level, but may be only one blocklet is locality.

-Regards
Kumar Vishal

On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]> wrote:

> Hi,
>
> We are using carbondata to build our table and running query in
> CarbonContext. We have some performance problem during refining the system.
>
> *Background*：
>
> *cluster*： 100 executor，5 task/executor, 10G
> memory/executor
>
> *data*: 60+GB(per one replica) as carbon data
> format, 600+MB/file * 100 file, 300+columns, 300+million rows
>
> *sql example:*
>
> select A,
>
> sum(a),
>
> sum(b),
>
> sum(c),
>
> …( extra 100 aggregation like
> sum(column))
>
> from Table1 LATERAL VIEW
> explode(split(Aarray, ‘*;*’)) ATable AS A
>
> where A is not null and d > “ab:c-10”
> and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A
>
> *target query time*: <10s
>
> *current query time*: 15s ~ 25s
>
> *scene:* OLAP system. <100 queries every day.
> Concurrency number is not high. Most time cpu is idle, so this service will
> run with other program. The service will run for long time. We could not
> occupy a very large memory for every executor.
>
> *refine*: I have build index and dictionary on
> d, e, f, g, h and build dictionary on all other aggregation columns(i.e. a,
> b, c, …100+ columns). And make sure there is one segment for total data. I
> have open the speculation(quantile=0.5, interval=250, multiplier=1.2).
>
> Time is mainly spent on first stage before shuffling. As 95% data will be
> filtered out, the shuffle process spend little time. In first stage, most
> task complete in less than 10s. But there still be near 50 tasks longer
> than 10s. Max task time in one query may be 12~16s.
>
> *Problem:*
>
> 1. GC problem. We suffer a 20%~30% GC time for some task in first
> stage after a lot of parameter refinement. We now use G1 GC in java8. GC
> time will double if use CMS. The main GC time is spent on young generation
> GC. Almost half memory of young generation will be copy to old generation.
> It seems lots of object has a long life than GC period and the space is not
> be reuse(as concurrent GC will release it later). When we use a large
> Eden(>=1G for example), once GC time will be seconds. If set Eden
> little(256M for example), once GC time will be hundreds milliseconds, but
> more frequency and total is still seconds. Is there any way to lessen the
> GC time? (We don’t consider the first query and second query in this case.)
>
> 2. Performance refine problem. Row number after being filtered is
> not uniform. Some node maybe heavy. It spend more time than other node. The
> time of one task is 4s ~ 16s. Is any method to refine it?
>
> 3. Too long time for first and second query. I know dictionary
> and some index need to be loaded for the first time. But after I trying use
> query below to preheat it, it still spend a lot of time. How could I
> preheat the query correctly?
>
> select Aarray, a, b, c… from Table1 where Aarray is not null
> and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
>
> 4. Any other suggestion to lessen the query time?
>
>
>
> Some suggestion：
>
> The log by class QueryStatisticsRecorder give me a good means
> to find the neck bottle, but not enough. There still some metric I think is
> very useful:
>
> 1. filter ratio. i.e.. not only result_size but also the origin
> size so we could know how many data is filtered.
>
> 2. IO time. The scan_blocks_time is not enough. If it is high,
> we know somethings wrong, but not know what cause that problem. The real IO
> time for data is not be provided. As there may be several file for one
> partition, know the program slow is caused by datanode or executor itself
> give us intuition to find the problem.
>
> 3. The TableBlockInfo for task. I log it by myself when
> debugging. It tell me how many blocklets is locality. The spark web monitor
> just give a locality level, but may be only one blocklet is locality.
>
>
> ---------------------
>
> Anning Luo
>
> *HULU*
>
> Email: [hidden email]
>
> [hidden email]
>

... [show rest of quote]

kumar vishal

Anning Luo-2

Nov 11, 2016; 6:29am

Re: GC problem and performance refine problem

14 posts

Hi Kumar Vishal,

1. Create table ddl:

CREATE TABLE IF NOT EXISTS Table1

( h Int, g Int, d String, f Int, e Int,

a Int, b Int, …(extra near 300 columns)

STORED BY 'org.apache.carbondata.format'
TBLPROPERTIES(

"NO_INVERTED_INDEX”=“a”,
"NO_INVERTED_INDEX”=“b”,

…(extra near 300 columns)

"DICTIONARY_INCLUDE”=“a”,

"DICTIONARY_INCLUDE”=“b”,

…(extra near 300 columns)

)

2. 3. There more than hundreds node in the cluster, but cluster is used mixed with other application. Some time when node is enough, we will get 100 distinct node.

4. I give a statistic of task time during once query and mark distinct nodes below:

2016-11-10 23:52 GMT+08:00 Kumar Vishal <[hidden email]>:

Hi Anning Luo,

Can u please provide below details.

1.Create table ddl.
2.Number of node in you cluster setup.
3. Number of executors per node.
4. Query statistics.

Please find my comments in bold.

Problem:
1. GC problem. We suffer a 20%~30% GC time for
some task in first stage after a lot of parameter refinement. We now use G1
GC in java8. GC time will double if use CMS. The main GC time is spent on
young generation GC. Almost half memory of young generation will be copy to
old generation. It seems lots of object has a long life than GC period and
the space is not be reuse(as concurrent GC will release it later). When we
use a large Eden(>=1G for example), once GC time will be seconds. If set
Eden little(256M for example), once GC time will be hundreds milliseconds,
but more frequency and total is still seconds. Is there any way to lessen
the GC time? (We don’t consider the first query and second query in this
case.)

*How many node are present in your cluster setup?? If nodes are less please
reduce the number of executors per node.*

2. Performance refine problem. Row number after
being filtered is not uniform. Some node maybe heavy. It spend more time
than other node. The time of one task is 4s ~ 16s. Is any method to refine
it?

3. Too long time for first and second query. I
know dictionary and some index need to be loaded for the first time. But
after I trying use query below to preheat it, it still spend a lot of time.
How could I preheat the query correctly?
select Aarray, a, b, c… from Table1 where Aarray is
not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55

*Currently we are working on first time query improvement. For now you can
run select count(*) or count(column), so all the blocks get loaded and then
you can run the actual query.*

4. Any other suggestion to lessen the query time?

Some suggestion：
The log by class QueryStatisticsRecorder give me a good means
to find the neck bottle, but not enough. There still some metric I think is
very useful:
1. filter ratio. i.e.. not only result_size but also the origin
size so we could know how many data is filtered.
2. IO time. The scan_blocks_time is not enough. If it is high,
we know somethings wrong, but not know what cause that problem. The real IO
time for data is not be provided. As there may be several file for one
partition, know the program slow is caused by datanode or executor itself
give us intuition to find the problem.
3. The TableBlockInfo for task. I log it by myself when
debugging. It tell me how many blocklets is locality. The spark web monitor
just give a locality level, but may be only one blocklet is locality.

-Regards
Kumar Vishal

On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]> wrote:

> Hi,
>
> We are using carbondata to build our table and running query in
> CarbonContext. We have some performance problem during refining the system.
>
> *Background*：
>
> *cluster*： 100 executor，5 task/executor, 10G
> memory/executor
>
> *data*: 60+GB(per one replica) as carbon data
> format, 600+MB/file * 100 file, 300+columns, 300+million rows
>
> *sql example:*
>
> select A,
>
> sum(a),
>
> sum(b),
>
> sum(c),
>
> …( extra 100 aggregation like
> sum(column))
>
> from Table1 LATERAL VIEW
> explode(split(Aarray, ‘*;*’)) ATable AS A
>
> where A is not null and d > “ab:c-10”
> and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A
>
> *target query time*: <10s
>
> *current query time*: 15s ~ 25s
>
> *scene:* OLAP system. <100 queries every day.
> Concurrency number is not high. Most time cpu is idle, so this service will
> run with other program. The service will run for long time. We could not
> occupy a very large memory for every executor.
>
> *refine*: I have build index and dictionary on
> d, e, f, g, h and build dictionary on all other aggregation columns(i.e. a,
> b, c, …100+ columns). And make sure there is one segment for total data. I
> have open the speculation(quantile=0.5, interval=250, multiplier=1.2).
>
> Time is mainly spent on first stage before shuffling. As 95% data will be
> filtered out, the shuffle process spend little time. In first stage, most
> task complete in less than 10s. But there still be near 50 tasks longer
> than 10s. Max task time in one query may be 12~16s.
>
> *Problem:*

>
> 1. GC problem. We suffer a 20%~30% GC time for some task in first
> stage after a lot of parameter refinement. We now use G1 GC in java8. GC
> time will double if use CMS. The main GC time is spent on young generation
> GC. Almost half memory of young generation will be copy to old generation.
> It seems lots of object has a long life than GC period and the space is not
> be reuse(as concurrent GC will release it later). When we use a large
> Eden(>=1G for example), once GC time will be seconds. If set Eden
> little(256M for example), once GC time will be hundreds milliseconds, but
> more frequency and total is still seconds. Is there any way to lessen the
> GC time? (We don’t consider the first query and second query in this case.)
>
> 2. Performance refine problem. Row number after being filtered is
> not uniform. Some node maybe heavy. It spend more time than other node. The
> time of one task is 4s ~ 16s. Is any method to refine it?
>
> 3. Too long time for first and second query. I know dictionary
> and some index need to be loaded for the first time. But after I trying use
> query below to preheat it, it still spend a lot of time. How could I
> preheat the query correctly?
>
> select Aarray, a, b, c… from Table1 where Aarray is not null
> and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
>
> 4. Any other suggestion to lessen the query time?
>
>
>
> Some suggestion：
>
> The log by class QueryStatisticsRecorder give me a good means
> to find the neck bottle, but not enough. There still some metric I think is
> very useful:
>
> 1. filter ratio. i.e.. not only result_size but also the origin
> size so we could know how many data is filtered.
>
> 2. IO time. The scan_blocks_time is not enough. If it is high,
> we know somethings wrong, but not know what cause that problem. The real IO
> time for data is not be provided. As there may be several file for one
> partition, know the program slow is caused by datanode or executor itself
> give us intuition to find the problem.
>
> 3. The TableBlockInfo for task. I log it by myself when
> debugging. It tell me how many blocklets is locality. The spark web monitor
> just give a locality level, but may be only one blocklet is locality.
>
>
> ---------------------
>
> Anning Luo
>

> *HULU*
>
> Email: [hidden email]
>
> [hidden email]
>

... [show rest of quote]

kumarvishal09

Nov 11, 2016; 7:08am

Re: GC problem and performance refine problem

142 posts

Hi An Lan,

Please confirm below things.

1. Is dynamic executor is enabled?? If it is enabled can u disabled and
try(this is to check is there any impact with dynamic executor)
for disabling dynamic executor you need set in spark-default.conf
--num-executors=100

2. Can u please set in below property enable.blocklet.distribution=false
and execute the query.

3. Cardinality of each column.

4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??

5. Can u keep all the columns which is present in filter on left side so
less no of blocks will identified during block pruning and it will improve
the query performance.

-Regards
Kumar Vishal

On Fri, Nov 11, 2016 at 11:59 AM, An Lan <[hidden email]> wrote:

> Hi Kumar Vishal,
>
> 1. Create table ddl:
>
> CREATE TABLE IF NOT EXISTS Table1
>
> (* h Int, g Int, d String, f Int, e Int,*
>
> a Int, b Int, …(extra near 300 columns)
>
> STORED BY 'org.apache.carbondata.format'
> TBLPROPERTIES(
>
> "NO_INVERTED_INDEX”=“a”,
> "NO_INVERTED_INDEX”=“b”,
>
> …(extra near 300 columns)
>
> "DICTIONARY_INCLUDE”=“a”,
>
> "DICTIONARY_INCLUDE”=“b”,
>
> …(extra near 300 columns)
>
> )
>
> 2. 3. There more than hundreds node in the cluster, but cluster is
> used mixed with other application. Some time when node is enough, we will
> get 100 distinct node.
>
> 4. I give a statistic of task time during once query and mark
> distinct nodes below:
>
> [image: 内嵌图片 1]
>
>
>
>
> 2016-11-10 23:52 GMT+08:00 Kumar Vishal <[hidden email]>:
>
>> Hi Anning Luo,
>>
>> Can u please provide below details.
>>
>> 1.Create table ddl.
>> 2.Number of node in you cluster setup.
>> 3. Number of executors per node.
>> 4. Query statistics.
>>
>> Please find my comments in bold.
>>
>> Problem:
>> 1. GC problem. We suffer a 20%~30% GC time for
>> some task in first stage after a lot of parameter refinement. We now use
>> G1
>> GC in java8. GC time will double if use CMS. The main GC time is spent on
>> young generation GC. Almost half memory of young generation will be copy
>> to
>> old generation. It seems lots of object has a long life than GC period and
>> the space is not be reuse(as concurrent GC will release it later). When we
>> use a large Eden(>=1G for example), once GC time will be seconds. If set
>> Eden little(256M for example), once GC time will be hundreds milliseconds,
>> but more frequency and total is still seconds. Is there any way to lessen
>> the GC time? (We don’t consider the first query and second query in this
>> case.)
>>
>> *How many node are present in your cluster setup?? If nodes are less
>> please
>> reduce the number of executors per node.*
>>
>> 2. Performance refine problem. Row number after
>> being filtered is not uniform. Some node maybe heavy. It spend more time
>> than other node. The time of one task is 4s ~ 16s. Is any method to refine
>> it?
>>
>> 3. Too long time for first and second query. I
>> know dictionary and some index need to be loaded for the first time. But
>> after I trying use query below to preheat it, it still spend a lot of
>> time.
>> How could I preheat the query correctly?
>> select Aarray, a, b, c… from Table1 where Aarray
>> is
>> not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
>>
>> *Currently we are working on first time query improvement. For now you can
>> run select count(*) or count(column), so all the blocks get loaded and
>> then
>> you can run the actual query.*
>>
>>
>> 4. Any other suggestion to lessen the query time?
>>
>>
>> Some suggestion：
>> The log by class QueryStatisticsRecorder give me a good means
>> to find the neck bottle, but not enough. There still some metric I think
>> is
>> very useful:
>> 1. filter ratio. i.e.. not only result_size but also the
>> origin
>> size so we could know how many data is filtered.
>> 2. IO time. The scan_blocks_time is not enough. If it is high,
>> we know somethings wrong, but not know what cause that problem. The real
>> IO
>> time for data is not be provided. As there may be several file for one
>> partition, know the program slow is caused by datanode or executor itself
>> give us intuition to find the problem.
>> 3. The TableBlockInfo for task. I log it by myself when
>> debugging. It tell me how many blocklets is locality. The spark web
>> monitor
>> just give a locality level, but may be only one blocklet is locality.
>>
>>
>> -Regards
>> Kumar Vishal
>>
>> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]> wrote:
>>
>> > Hi,
>> >
>> > We are using carbondata to build our table and running query in
>> > CarbonContext. We have some performance problem during refining the
>> system.
>> >
>> > *Background*：
>> >
>> > *cluster*： 100 executor，5 task/executor, 10G
>> > memory/executor
>> >
>> > *data*: 60+GB(per one replica) as carbon
>> data
>> > format, 600+MB/file * 100 file, 300+columns, 300+million rows
>> >
>> > *sql example:*
>> >
>> > select A,
>> >
>> > sum(a),
>> >
>> > sum(b),
>> >
>> > sum(c),
>> >
>> > …( extra 100 aggregation like
>> > sum(column))
>> >
>> > from Table1 LATERAL VIEW
>> > explode(split(Aarray, ‘*;*’)) ATable AS A
>> >
>> > where A is not null and d >
>> “ab:c-10”
>> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A
>> >
>> > *target query time*: <10s
>> >
>> > *current query time*: 15s ~ 25s
>> >
>> > *scene:* OLAP system. <100 queries every
>> day.
>> > Concurrency number is not high. Most time cpu is idle, so this service
>> will
>> > run with other program. The service will run for long time. We could not
>> > occupy a very large memory for every executor.
>> >
>> > *refine*: I have build index and
>> dictionary on
>> > d, e, f, g, h and build dictionary on all other aggregation
>> columns(i.e. a,
>> > b, c, …100+ columns). And make sure there is one segment for total
>> data. I
>> > have open the speculation(quantile=0.5, interval=250, multiplier=1.2).
>> >
>> > Time is mainly spent on first stage before shuffling. As 95% data will
>> be
>> > filtered out, the shuffle process spend little time. In first stage,
>> most
>> > task complete in less than 10s. But there still be near 50 tasks longer
>> > than 10s. Max task time in one query may be 12~16s.
>> >
>> > *Problem:*
>> >
>> > 1. GC problem. We suffer a 20%~30% GC time for some task in
>> first
>> > stage after a lot of parameter refinement. We now use G1 GC in java8. GC
>> > time will double if use CMS. The main GC time is spent on young
>> generation
>> > GC. Almost half memory of young generation will be copy to old
>> generation.
>> > It seems lots of object has a long life than GC period and the space is
>> not
>> > be reuse(as concurrent GC will release it later). When we use a large
>> > Eden(>=1G for example), once GC time will be seconds. If set Eden
>> > little(256M for example), once GC time will be hundreds milliseconds,
>> but
>> > more frequency and total is still seconds. Is there any way to lessen
>> the
>> > GC time? (We don’t consider the first query and second query in this
>> case.)
>> >
>> > 2. Performance refine problem. Row number after being
>> filtered is
>> > not uniform. Some node maybe heavy. It spend more time than other node.
>> The
>> > time of one task is 4s ~ 16s. Is any method to refine it?
>> >
>> > 3. Too long time for first and second query. I know dictionary
>> > and some index need to be loaded for the first time. But after I trying
>> use
>> > query below to preheat it, it still spend a lot of time. How could I
>> > preheat the query correctly?
>> >
>> > select Aarray, a, b, c… from Table1 where Aarray is not
>> null
>> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
>> >
>> > 4. Any other suggestion to lessen the query time?
>> >
>> >
>> >
>> > Some suggestion：
>> >
>> > The log by class QueryStatisticsRecorder give me a good
>> means
>> > to find the neck bottle, but not enough. There still some metric I
>> think is
>> > very useful:
>> >
>> > 1. filter ratio. i.e.. not only result_size but also the
>> origin
>> > size so we could know how many data is filtered.
>> >
>> > 2. IO time. The scan_blocks_time is not enough. If it is
>> high,
>> > we know somethings wrong, but not know what cause that problem. The
>> real IO
>> > time for data is not be provided. As there may be several file for one
>> > partition, know the program slow is caused by datanode or executor
>> itself
>> > give us intuition to find the problem.
>> >
>> > 3. The TableBlockInfo for task. I log it by myself when
>> > debugging. It tell me how many blocklets is locality. The spark web
>> monitor
>> > just give a locality level, but may be only one blocklet is locality.
>> >
>> >
>> > ---------------------
>> >
>> > Anning Luo
>> >
>> > *HULU*
>> >
>> > Email: [hidden email]
>> >
>> > [hidden email]
>> >
>>
>
>

... [show rest of quote]

kumar vishal

Anning Luo-2

Nov 11, 2016; 9:51am

Re: GC problem and performance refine problem

14 posts

1. I have set --num-executors=100 in all test. The image write 100 nodes
means one executor for one node, and when there is 64 node, there maybe two
or three executor on one node. The total executor number is always 100.

2. I do not find the property enable.blocklet.distribution in 0.1.1. I
found it in 0.2.0. I am testing on 0.1.1, and will try it later. If you
concern locality level, it seems that the most long query is not caused by
IO problem. Statistic of the long task statistic like this:

+--------------------+----------------+--------------------+----------------+---------------+-----------+-------------------+
| task_id|load_blocks_time|load_dictionary_time|scan_blocks_time|scan_blocks_num|result_size|total_executor_time|
+--------------------+----------------+--------------------+----------------+---------------+-----------+-------------------+
|4199754456147478_134| 1 | 32 |
3306 | 3 | 42104 | 13029 |
+--------------------+----------------+--------------------+----------------+---------------+-----------+-------------------+

I have suffer from IO, but after configure speculation, most tasks
with a long IO will be resend.

3. distinct value:

g: 11

h: 3

f: 7

e: 3

d: 281

4. In the query, only e, f ,g, h, d and A are used in filter, others are
not used. So I think others used in the aggregation are no need added for
index.

5. I have write the e, f, g, h, d in most left just after "create table..."

2016-11-11 15:08 GMT+08:00 Kumar Vishal <[hidden email]>:

> Hi An Lan,
>
> Please confirm below things.
>
> 1. Is dynamic executor is enabled?? If it is enabled can u disabled and
> try(this is to check is there any impact with dynamic executor)
> for disabling dynamic executor you need set in spark-default.conf
> --num-executors=100
>
> 2. Can u please set in below property enable.blocklet.distribution=false
> and execute the query.
>
> 3. Cardinality of each column.
>
> 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??
>
> 5. Can u keep all the columns which is present in filter on left side so
> less no of blocks will identified during block pruning and it will improve
> the query performance.
>
>
> -Regards
> Kumar Vishal
>
> On Fri, Nov 11, 2016 at 11:59 AM, An Lan <[hidden email]> wrote:
>
> > Hi Kumar Vishal,
> >
> > 1. Create table ddl:
> >
> > CREATE TABLE IF NOT EXISTS Table1
> >
> > (* h Int, g Int, d String, f Int, e Int,*
> >
> > a Int, b Int, …(extra near 300 columns)
> >
> > STORED BY 'org.apache.carbondata.format'
> > TBLPROPERTIES(
> >
> > "NO_INVERTED_INDEX”=“a”,
> > "NO_INVERTED_INDEX”=“b”,
> >
> > …(extra near 300 columns)
> >
> > "DICTIONARY_INCLUDE”=“a”,
> >
> > "DICTIONARY_INCLUDE”=“b”,
> >
> > …(extra near 300 columns)
> >
> > )
> >
> > 2. 3. There more than hundreds node in the cluster, but cluster is
> > used mixed with other application. Some time when node is enough, we
> will
> > get 100 distinct node.
> >
> > 4. I give a statistic of task time during once query and mark
> > distinct nodes below:
> >
> > [image: 内嵌图片 1]
> >
> >
> >
> >
> > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <[hidden email]>:
> >
> >> Hi Anning Luo,
> >>
> >> Can u please provide below details.
> >>
> >> 1.Create table ddl.
> >> 2.Number of node in you cluster setup.
> >> 3. Number of executors per node.
> >> 4. Query statistics.
> >>
> >> Please find my comments in bold.
> >>
> >> Problem:
> >> 1. GC problem. We suffer a 20%~30% GC time for
> >> some task in first stage after a lot of parameter refinement. We now use
> >> G1
> >> GC in java8. GC time will double if use CMS. The main GC time is spent
> on
> >> young generation GC. Almost half memory of young generation will be copy
> >> to
> >> old generation. It seems lots of object has a long life than GC period
> and
> >> the space is not be reuse(as concurrent GC will release it later). When
> we
> >> use a large Eden(>=1G for example), once GC time will be seconds. If set
> >> Eden little(256M for example), once GC time will be hundreds
> milliseconds,
> >> but more frequency and total is still seconds. Is there any way to
> lessen
> >> the GC time? (We don’t consider the first query and second query in this
> >> case.)
> >>
> >> *How many node are present in your cluster setup?? If nodes are less
> >> please
> >> reduce the number of executors per node.*
> >>
> >> 2. Performance refine problem. Row number after
> >> being filtered is not uniform. Some node maybe heavy. It spend more time
> >> than other node. The time of one task is 4s ~ 16s. Is any method to
> refine
> >> it?
> >>
> >> 3. Too long time for first and second query. I
> >> know dictionary and some index need to be loaded for the first time. But
> >> after I trying use query below to preheat it, it still spend a lot of
> >> time.
> >> How could I preheat the query correctly?
> >> select Aarray, a, b, c… from Table1 where Aarray
> >> is
> >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> >>
> >> *Currently we are working on first time query improvement. For now you
> can
> >> run select count(*) or count(column), so all the blocks get loaded and
> >> then
> >> you can run the actual query.*
> >>
> >>
> >> 4. Any other suggestion to lessen the query time?
> >>
> >>
> >> Some suggestion：
> >> The log by class QueryStatisticsRecorder give me a good
> means
> >> to find the neck bottle, but not enough. There still some metric I think
> >> is
> >> very useful:
> >> 1. filter ratio. i.e.. not only result_size but also the
> >> origin
> >> size so we could know how many data is filtered.
> >> 2. IO time. The scan_blocks_time is not enough. If it is
> high,
> >> we know somethings wrong, but not know what cause that problem. The real
> >> IO
> >> time for data is not be provided. As there may be several file for one
> >> partition, know the program slow is caused by datanode or executor
> itself
> >> give us intuition to find the problem.
> >> 3. The TableBlockInfo for task. I log it by myself when
> >> debugging. It tell me how many blocklets is locality. The spark web
> >> monitor
> >> just give a locality level, but may be only one blocklet is locality.
> >>
> >>
> >> -Regards
> >> Kumar Vishal
> >>
> >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]> wrote:
> >>
> >> > Hi,
> >> >
> >> > We are using carbondata to build our table and running query in
> >> > CarbonContext. We have some performance problem during refining the
> >> system.
> >> >
> >> > *Background*：
> >> >
> >> > *cluster*： 100 executor，5 task/executor, 10G
> >> > memory/executor
> >> >
> >> > *data*: 60+GB(per one replica) as carbon
> >> data
> >> > format, 600+MB/file * 100 file, 300+columns, 300+million rows
> >> >
> >> > *sql example:*
> >> >
> >> > select A,
> >> >
> >> > sum(a),
> >> >
> >> > sum(b),
> >> >
> >> > sum(c),
> >> >
> >> > …( extra 100 aggregation like
> >> > sum(column))
> >> >
> >> > from Table1 LATERAL VIEW
> >> > explode(split(Aarray, ‘*;*’)) ATable AS A
> >> >
> >> > where A is not null and d >
> >> “ab:c-10”
> >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A
> >> >
> >> > *target query time*: <10s
> >> >
> >> > *current query time*: 15s ~ 25s
> >> >
> >> > *scene:* OLAP system. <100 queries every
> >> day.
> >> > Concurrency number is not high. Most time cpu is idle, so this service
> >> will
> >> > run with other program. The service will run for long time. We could
> not
> >> > occupy a very large memory for every executor.
> >> >
> >> > *refine*: I have build index and
> >> dictionary on
> >> > d, e, f, g, h and build dictionary on all other aggregation
> >> columns(i.e. a,
> >> > b, c, …100+ columns). And make sure there is one segment for total
> >> data. I
> >> > have open the speculation(quantile=0.5, interval=250, multiplier=1.2).
> >> >
> >> > Time is mainly spent on first stage before shuffling. As 95% data will
> >> be
> >> > filtered out, the shuffle process spend little time. In first stage,
> >> most
> >> > task complete in less than 10s. But there still be near 50 tasks
> longer
> >> > than 10s. Max task time in one query may be 12~16s.
> >> >
> >> > *Problem:*
> >> >
> >> > 1. GC problem. We suffer a 20%~30% GC time for some task in
> >> first
> >> > stage after a lot of parameter refinement. We now use G1 GC in java8.
> GC
> >> > time will double if use CMS. The main GC time is spent on young
> >> generation
> >> > GC. Almost half memory of young generation will be copy to old
> >> generation.
> >> > It seems lots of object has a long life than GC period and the space
> is
> >> not
> >> > be reuse(as concurrent GC will release it later). When we use a large
> >> > Eden(>=1G for example), once GC time will be seconds. If set Eden
> >> > little(256M for example), once GC time will be hundreds milliseconds,
> >> but
> >> > more frequency and total is still seconds. Is there any way to lessen
> >> the
> >> > GC time? (We don’t consider the first query and second query in this
> >> case.)
> >> >
> >> > 2. Performance refine problem. Row number after being
> >> filtered is
> >> > not uniform. Some node maybe heavy. It spend more time than other
> node.
> >> The
> >> > time of one task is 4s ~ 16s. Is any method to refine it?
> >> >
> >> > 3. Too long time for first and second query. I know
> dictionary
> >> > and some index need to be loaded for the first time. But after I
> trying
> >> use
> >> > query below to preheat it, it still spend a lot of time. How could I
> >> > preheat the query correctly?
> >> >
> >> > select Aarray, a, b, c… from Table1 where Aarray is not
> >> null
> >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> >> >
> >> > 4. Any other suggestion to lessen the query time?
> >> >
> >> >
> >> >
> >> > Some suggestion：
> >> >
> >> > The log by class QueryStatisticsRecorder give me a good
> >> means
> >> > to find the neck bottle, but not enough. There still some metric I
> >> think is
> >> > very useful:
> >> >
> >> > 1. filter ratio. i.e.. not only result_size but also the
> >> origin
> >> > size so we could know how many data is filtered.
> >> >
> >> > 2. IO time. The scan_blocks_time is not enough. If it is
> >> high,
> >> > we know somethings wrong, but not know what cause that problem. The
> >> real IO
> >> > time for data is not be provided. As there may be several file for one
> >> > partition, know the program slow is caused by datanode or executor
> >> itself
> >> > give us intuition to find the problem.
> >> >
> >> > 3. The TableBlockInfo for task. I log it by myself when
> >> > debugging. It tell me how many blocklets is locality. The spark web
> >> monitor
> >> > just give a locality level, but may be only one blocklet is locality.
> >> >
> >> >
> >> > ---------------------
> >> >
> >> > Anning Luo
> >> >
> >> > *HULU*
> >> >
> >> > Email: [hidden email]
> >> >
> >> > [hidden email]
> >> >
> >>
> >
> >
>

... [show rest of quote]

kumarvishal09

Nov 11, 2016; 12:25pm

Re: GC problem and performance refine problem

142 posts

What is the time difference between first time and second time query as
second time it will read from os cache,so i think there wont be any IO
bottleneck.

Can u provide driver log and executor log for task which are taking more
time.

-Regards
Kumar Vishal

On Fri, Nov 11, 2016 at 3:21 PM, An Lan <[hidden email]> wrote:

> 1. I have set --num-executors=100 in all test. The image write 100 nodes
> means one executor for one node, and when there is 64 node, there maybe two
> or three executor on one node. The total executor number is always 100.
>
> 2. I do not find the property enable.blocklet.distribution in 0.1.1. I
> found it in 0.2.0. I am testing on 0.1.1, and will try it later. If you
> concern locality level, it seems that the most long query is not caused by
> IO problem. Statistic of the long task statistic like this:
>
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+-------------------+
> | task_id|load_blocks_time|load_dictionary_time|scan_blocks_
> time|scan_blocks_num|result_size|total_executor_time|
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+-------------------+
> |4199754456147478_134| 1 | 32 |
> 3306 | 3 | 42104 | 13029 |
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+-------------------+
>
> I have suffer from IO, but after configure speculation, most tasks
> with a long IO will be resend.
>
>
> 3. distinct value:
>
> g: 11
>
> h: 3
>
> f: 7
>
> e: 3
>
> d: 281
>
> 4. In the query, only e, f ,g, h, d and A are used in filter, others are
> not used. So I think others used in the aggregation are no need added for
> index.
>
> 5. I have write the e, f, g, h, d in most left just after "create table..."
>
> 2016-11-11 15:08 GMT+08:00 Kumar Vishal <[hidden email]>:
>
> > Hi An Lan,
> >
> > Please confirm below things.
> >
> > 1. Is dynamic executor is enabled?? If it is enabled can u disabled and
> > try(this is to check is there any impact with dynamic executor)
> > for disabling dynamic executor you need set in spark-default.conf
> > --num-executors=100
> >
> > 2. Can u please set in below property enable.blocklet.distribution=false
> > and execute the query.
> >
> > 3. Cardinality of each column.
> >
> > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??
> >
> > 5. Can u keep all the columns which is present in filter on left side so
> > less no of blocks will identified during block pruning and it will
> improve
> > the query performance.
> >
> >
> > -Regards
> > Kumar Vishal
> >
> > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <[hidden email]> wrote:
> >
> > > Hi Kumar Vishal,
> > >
> > > 1. Create table ddl:
> > >
> > > CREATE TABLE IF NOT EXISTS Table1
> > >
> > > (* h Int, g Int, d String, f Int, e Int,*
> > >
> > > a Int, b Int, …(extra near 300 columns)
> > >
> > > STORED BY 'org.apache.carbondata.format'
> > > TBLPROPERTIES(
> > >
> > > "NO_INVERTED_INDEX”=“a”,
> > > "NO_INVERTED_INDEX”=“b”,
> > >
> > > …(extra near 300 columns)
> > >
> > > "DICTIONARY_INCLUDE”=“a”,
> > >
> > > "DICTIONARY_INCLUDE”=“b”,
> > >
> > > …(extra near 300 columns)
> > >
> > > )
> > >
> > > 2. 3. There more than hundreds node in the cluster, but cluster
> is
> > > used mixed with other application. Some time when node is enough, we
> > will
> > > get 100 distinct node.
> > >
> > > 4. I give a statistic of task time during once query and mark
> > > distinct nodes below:
> > >
> > > [image: 内嵌图片 1]
> > >
> > >
> > >
> > >
> > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <[hidden email]>:
> > >
> > >> Hi Anning Luo,
> > >>
> > >> Can u please provide below details.
> > >>
> > >> 1.Create table ddl.
> > >> 2.Number of node in you cluster setup.
> > >> 3. Number of executors per node.
> > >> 4. Query statistics.
> > >>
> > >> Please find my comments in bold.
> > >>
> > >> Problem:
> > >> 1. GC problem. We suffer a 20%~30% GC time
> for
> > >> some task in first stage after a lot of parameter refinement. We now
> use
> > >> G1
> > >> GC in java8. GC time will double if use CMS. The main GC time is spent
> > on
> > >> young generation GC. Almost half memory of young generation will be
> copy
> > >> to
> > >> old generation. It seems lots of object has a long life than GC period
> > and
> > >> the space is not be reuse(as concurrent GC will release it later).
> When
> > we
> > >> use a large Eden(>=1G for example), once GC time will be seconds. If
> set
> > >> Eden little(256M for example), once GC time will be hundreds
> > milliseconds,
> > >> but more frequency and total is still seconds. Is there any way to
> > lessen
> > >> the GC time? (We don’t consider the first query and second query in
> this
> > >> case.)
> > >>
> > >> *How many node are present in your cluster setup?? If nodes are less
> > >> please
> > >> reduce the number of executors per node.*
> > >>
> > >> 2. Performance refine problem. Row number
> after
> > >> being filtered is not uniform. Some node maybe heavy. It spend more
> time
> > >> than other node. The time of one task is 4s ~ 16s. Is any method to
> > refine
> > >> it?
> > >>
> > >> 3. Too long time for first and second query.
> I
> > >> know dictionary and some index need to be loaded for the first time.
> But
> > >> after I trying use query below to preheat it, it still spend a lot of
> > >> time.
> > >> How could I preheat the query correctly?
> > >> select Aarray, a, b, c… from Table1 where
> Aarray
> > >> is
> > >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> > >>
> > >> *Currently we are working on first time query improvement. For now you
> > can
> > >> run select count(*) or count(column), so all the blocks get loaded and
> > >> then
> > >> you can run the actual query.*
> > >>
> > >>
> > >> 4. Any other suggestion to lessen the query time?
> > >>
> > >>
> > >> Some suggestion：
> > >> The log by class QueryStatisticsRecorder give me a good
> > means
> > >> to find the neck bottle, but not enough. There still some metric I
> think
> > >> is
> > >> very useful:
> > >> 1. filter ratio. i.e.. not only result_size but also the
> > >> origin
> > >> size so we could know how many data is filtered.
> > >> 2. IO time. The scan_blocks_time is not enough. If it is
> > high,
> > >> we know somethings wrong, but not know what cause that problem. The
> real
> > >> IO
> > >> time for data is not be provided. As there may be several file for one
> > >> partition, know the program slow is caused by datanode or executor
> > itself
> > >> give us intuition to find the problem.
> > >> 3. The TableBlockInfo for task. I log it by myself when
> > >> debugging. It tell me how many blocklets is locality. The spark web
> > >> monitor
> > >> just give a locality level, but may be only one blocklet is locality.
> > >>
> > >>
> > >> -Regards
> > >> Kumar Vishal
> > >>
> > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > We are using carbondata to build our table and running query in
> > >> > CarbonContext. We have some performance problem during refining the
> > >> system.
> > >> >
> > >> > *Background*：
> > >> >
> > >> > *cluster*： 100 executor，5 task/executor, 10G
> > >> > memory/executor
> > >> >
> > >> > *data*: 60+GB(per one replica) as
> carbon
> > >> data
> > >> > format, 600+MB/file * 100 file, 300+columns, 300+million rows
> > >> >
> > >> > *sql example:*
> > >> >
> > >> > select A,
> > >> >
> > >> > sum(a),
> > >> >
> > >> > sum(b),
> > >> >
> > >> > sum(c),
> > >> >
> > >> > …( extra 100 aggregation like
> > >> > sum(column))
> > >> >
> > >> > from Table1 LATERAL VIEW
> > >> > explode(split(Aarray, ‘*;*’)) ATable AS A
> > >> >
> > >> > where A is not null and d >
> > >> “ab:c-10”
> > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A
> > >> >
> > >> > *target query time*: <10s
> > >> >
> > >> > *current query time*: 15s ~ 25s
> > >> >
> > >> > *scene:* OLAP system. <100 queries every
> > >> day.
> > >> > Concurrency number is not high. Most time cpu is idle, so this
> service
> > >> will
> > >> > run with other program. The service will run for long time. We could
> > not
> > >> > occupy a very large memory for every executor.
> > >> >
> > >> > *refine*: I have build index and
> > >> dictionary on
> > >> > d, e, f, g, h and build dictionary on all other aggregation
> > >> columns(i.e. a,
> > >> > b, c, …100+ columns). And make sure there is one segment for total
> > >> data. I
> > >> > have open the speculation(quantile=0.5, interval=250,
> multiplier=1.2).
> > >> >
> > >> > Time is mainly spent on first stage before shuffling. As 95% data
> will
> > >> be
> > >> > filtered out, the shuffle process spend little time. In first stage,
> > >> most
> > >> > task complete in less than 10s. But there still be near 50 tasks
> > longer
> > >> > than 10s. Max task time in one query may be 12~16s.
> > >> >
> > >> > *Problem:*
> > >> >
> > >> > 1. GC problem. We suffer a 20%~30% GC time for some task in
> > >> first
> > >> > stage after a lot of parameter refinement. We now use G1 GC in
> java8.
> > GC
> > >> > time will double if use CMS. The main GC time is spent on young
> > >> generation
> > >> > GC. Almost half memory of young generation will be copy to old
> > >> generation.
> > >> > It seems lots of object has a long life than GC period and the space
> > is
> > >> not
> > >> > be reuse(as concurrent GC will release it later). When we use a
> large
> > >> > Eden(>=1G for example), once GC time will be seconds. If set Eden
> > >> > little(256M for example), once GC time will be hundreds
> milliseconds,
> > >> but
> > >> > more frequency and total is still seconds. Is there any way to
> lessen
> > >> the
> > >> > GC time? (We don’t consider the first query and second query in this
> > >> case.)
> > >> >
> > >> > 2. Performance refine problem. Row number after being
> > >> filtered is
> > >> > not uniform. Some node maybe heavy. It spend more time than other
> > node.
> > >> The
> > >> > time of one task is 4s ~ 16s. Is any method to refine it?
> > >> >
> > >> > 3. Too long time for first and second query. I know
> > dictionary
> > >> > and some index need to be loaded for the first time. But after I
> > trying
> > >> use
> > >> > query below to preheat it, it still spend a lot of time. How could I
> > >> > preheat the query correctly?
> > >> >
> > >> > select Aarray, a, b, c… from Table1 where Aarray is
> not
> > >> null
> > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> > >> >
> > >> > 4. Any other suggestion to lessen the query time?
> > >> >
> > >> >
> > >> >
> > >> > Some suggestion：
> > >> >
> > >> > The log by class QueryStatisticsRecorder give me a good
> > >> means
> > >> > to find the neck bottle, but not enough. There still some metric I
> > >> think is
> > >> > very useful:
> > >> >
> > >> > 1. filter ratio. i.e.. not only result_size but also the
> > >> origin
> > >> > size so we could know how many data is filtered.
> > >> >
> > >> > 2. IO time. The scan_blocks_time is not enough. If it is
> > >> high,
> > >> > we know somethings wrong, but not know what cause that problem. The
> > >> real IO
> > >> > time for data is not be provided. As there may be several file for
> one
> > >> > partition, know the program slow is caused by datanode or executor
> > >> itself
> > >> > give us intuition to find the problem.
> > >> >
> > >> > 3. The TableBlockInfo for task. I log it by myself when
> > >> > debugging. It tell me how many blocklets is locality. The spark web
> > >> monitor
> > >> > just give a locality level, but may be only one blocklet is
> locality.
> > >> >
> > >> >
> > >> > ---------------------
> > >> >
> > >> > Anning Luo
> > >> >
> > >> > *HULU*
> > >> >
> > >> > Email: [hidden email]
> > >> >
> > >> > [hidden email]
> > >> >
> > >>
> > >
> > >
> >
>

... [show rest of quote]

kumar vishal

Anning Luo-2

Nov 14, 2016; 3:29am

Re: GC problem and performance refine problem

14 posts

Hi Kumar Vishal,

Driver and some executor logs are in the accessory. The same query run for five times.

Time consume for every query:

67068ms, 45758ms, 26497ms, 22619ms, 21504ms

The first stage for every query:

(first query -> start from 0 stage

second query -> 4

third query -> 8

fourth query -> 12

five query -> 16)

see "first stage of queries.png" in accessory

Longest task for every query:

(Task and log relationship:

stage 0, task 249 -> 1.log

stage 4, task 420 -> 2.log

stage 8, task 195 -> 3.log

stage 12, task 350 -> 4.log

stage 16, task 321 -> 5.log)

see "longest task.png" in accessory

longest tasks.png

first stage of queries.png

2016-11-14 11:22 GMT+08:00 An Lan <[hidden email]>:

Hi,

Driver and some executor logs are in the accessory. The same query run for five times.

Time consume for every query:

67068ms, 45758ms, 26497ms, 22619ms, 21504ms

The first stage for every query:

(first query -> start from 0 stage

second query -> 4

third query -> 8

fourth query -> 12

five query -> 16)

Longest task for every query:

(Task and log relationship:

stage 0, task 249 -> 1.log

stage 4, task 420 -> 2.log

stage 8, task 195 -> 3.log

stage 12, task 350 -> 4.log

stage 16, task 321 -> 5.log)

2016-11-11 20:25 GMT+08:00 Kumar Vishal <[hidden email]>:
What is the time difference between first time and second time query as
second time it will read from os cache,so i think there wont be any IO
bottleneck.

Can u provide driver log and executor log for task which are taking more
time.

-Regards
Kumar Vishal

On Fri, Nov 11, 2016 at 3:21 PM, An Lan <[hidden email]> wrote:

> 1. I have set --num-executors=100 in all test. The image write 100 nodes
> means one executor for one node, and when there is 64 node, there maybe two
> or three executor on one node. The total executor number is always 100.
>
> 2. I do not find the property enable.blocklet.distribution in 0.1.1. I
> found it in 0.2.0. I am testing on 0.1.1, and will try it later. If you
> concern locality level, it seems that the most long query is not caused by
> IO problem. Statistic of the long task statistic like this:
>
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+-------------------+
> | task_id|load_blocks_time|load_dictionary_time|scan_blocks_
> time|scan_blocks_num|result_size|total_executor_time|
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+-------------------+
> |4199754456147478_134| 1 | 32 |
> 3306 | 3 | 42104 | 13029 |
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+-------------------+
>
> I have suffer from IO, but after configure speculation, most tasks
> with a long IO will be resend.
>
>
> 3. distinct value:
>
> g: 11
>
> h: 3
>
> f: 7
>
> e: 3
>
> d: 281
>
> 4. In the query, only e, f ,g, h, d and A are used in filter, others are
> not used. So I think others used in the aggregation are no need added for
> index.
>
> 5. I have write the e, f, g, h, d in most left just after "create table..."
>
> 2016-11-11 15:08 GMT+08:00 Kumar Vishal <[hidden email]>:
>
> > Hi An Lan,
> >
> > Please confirm below things.
> >
> > 1. Is dynamic executor is enabled?? If it is enabled can u disabled and
> > try(this is to check is there any impact with dynamic executor)
> > for disabling dynamic executor you need set in spark-default.conf
> > --num-executors=100
> >
> > 2. Can u please set in below property enable.blocklet.distribution=false
> > and execute the query.
> >
> > 3. Cardinality of each column.
> >
> > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??
> >
> > 5. Can u keep all the columns which is present in filter on left side so
> > less no of blocks will identified during block pruning and it will
> improve
> > the query performance.
> >
> >
> > -Regards
> > Kumar Vishal
> >
> > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <[hidden email]> wrote:
> >
> > > Hi Kumar Vishal,
> > >
> > > 1. Create table ddl:
> > >
> > > CREATE TABLE IF NOT EXISTS Table1
> > >
> > > (* h Int, g Int, d String, f Int, e Int,*
> > >
> > > a Int, b Int, …(extra near 300 columns)
> > >
> > > STORED BY 'org.apache.carbondata.format'
> > > TBLPROPERTIES(
> > >
> > > "NO_INVERTED_INDEX”=“a”,
> > > "NO_INVERTED_INDEX”=“b”,
> > >
> > > …(extra near 300 columns)
> > >
> > > "DICTIONARY_INCLUDE”=“a”,
> > >
> > > "DICTIONARY_INCLUDE”=“b”,
> > >
> > > …(extra near 300 columns)
> > >
> > > )
> > >
> > > 2. 3. There more than hundreds node in the cluster, but cluster
> is
> > > used mixed with other application. Some time when node is enough, we
> > will
> > > get 100 distinct node.
> > >
> > > 4. I give a statistic of task time during once query and mark
> > > distinct nodes below:
> > >
> > > [image: 内嵌图片 1]
> > >
> > >
> > >
> > >
> > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <[hidden email]>:
> > >
> > >> Hi Anning Luo,
> > >>
> > >> Can u please provide below details.
> > >>
> > >> 1.Create table ddl.
> > >> 2.Number of node in you cluster setup.
> > >> 3. Number of executors per node.
> > >> 4. Query statistics.
> > >>
> > >> Please find my comments in bold.
> > >>
> > >> Problem:
> > >> 1. GC problem. We suffer a 20%~30% GC time
> for
> > >> some task in first stage after a lot of parameter refinement. We now
> use
> > >> G1
> > >> GC in java8. GC time will double if use CMS. The main GC time is spent
> > on
> > >> young generation GC. Almost half memory of young generation will be
> copy
> > >> to
> > >> old generation. It seems lots of object has a long life than GC period
> > and
> > >> the space is not be reuse(as concurrent GC will release it later).
> When
> > we
> > >> use a large Eden(>=1G for example), once GC time will be seconds. If
> set
> > >> Eden little(256M for example), once GC time will be hundreds
> > milliseconds,
> > >> but more frequency and total is still seconds. Is there any way to
> > lessen
> > >> the GC time? (We don’t consider the first query and second query in
> this
> > >> case.)
> > >>
> > >> *How many node are present in your cluster setup?? If nodes are less
> > >> please
> > >> reduce the number of executors per node.*
> > >>
> > >> 2. Performance refine problem. Row number
> after
> > >> being filtered is not uniform. Some node maybe heavy. It spend more
> time
> > >> than other node. The time of one task is 4s ~ 16s. Is any method to
> > refine
> > >> it?
> > >>
> > >> 3. Too long time for first and second query.
> I
> > >> know dictionary and some index need to be loaded for the first time.
> But
> > >> after I trying use query below to preheat it, it still spend a lot of
> > >> time.
> > >> How could I preheat the query correctly?
> > >> select Aarray, a, b, c… from Table1 where
> Aarray
> > >> is
> > >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> > >>
> > >> *Currently we are working on first time query improvement. For now you
> > can
> > >> run select count(*) or count(column), so all the blocks get loaded and
> > >> then
> > >> you can run the actual query.*
> > >>
> > >>
> > >> 4. Any other suggestion to lessen the query time?
> > >>
> > >>
> > >> Some suggestion：
> > >> The log by class QueryStatisticsRecorder give me a good
> > means
> > >> to find the neck bottle, but not enough. There still some metric I
> think
> > >> is
> > >> very useful:
> > >> 1. filter ratio. i.e.. not only result_size but also the
> > >> origin
> > >> size so we could know how many data is filtered.
> > >> 2. IO time. The scan_blocks_time is not enough. If it is
> > high,
> > >> we know somethings wrong, but not know what cause that problem. The
> real
> > >> IO
> > >> time for data is not be provided. As there may be several file for one
> > >> partition, know the program slow is caused by datanode or executor
> > itself
> > >> give us intuition to find the problem.
> > >> 3. The TableBlockInfo for task. I log it by myself when
> > >> debugging. It tell me how many blocklets is locality. The spark web
> > >> monitor
> > >> just give a locality level, but may be only one blocklet is locality.
> > >>
> > >>
> > >> -Regards
> > >> Kumar Vishal
> > >>
> > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > We are using carbondata to build our table and running query in
> > >> > CarbonContext. We have some performance problem during refining the
> > >> system.
> > >> >
> > >> > *Background*：
> > >> >
> > >> > *cluster*： 100 executor，5 task/executor, 10G
> > >> > memory/executor
> > >> >
> > >> > *data*: 60+GB(per one replica) as
> carbon
> > >> data
> > >> > format, 600+MB/file * 100 file, 300+columns, 300+million rows
> > >> >
> > >> > *sql example:*
> > >> >
> > >> > select A,
> > >> >
> > >> > sum(a),
> > >> >
> > >> > sum(b),
> > >> >
> > >> > sum(c),
> > >> >
> > >> > …( extra 100 aggregation like
> > >> > sum(column))
> > >> >
> > >> > from Table1 LATERAL VIEW
> > >> > explode(split(Aarray, ‘*;*’)) ATable AS A
> > >> >
> > >> > where A is not null and d >
> > >> “ab:c-10”
> > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A
> > >> >
> > >> > *target query time*: <10s
> > >> >
> > >> > *current query time*: 15s ~ 25s
> > >> >
> > >> > *scene:* OLAP system. <100 queries every
> > >> day.
> > >> > Concurrency number is not high. Most time cpu is idle, so this
> service
> > >> will
> > >> > run with other program. The service will run for long time. We could
> > not
> > >> > occupy a very large memory for every executor.
> > >> >
> > >> > *refine*: I have build index and
> > >> dictionary on
> > >> > d, e, f, g, h and build dictionary on all other aggregation
> > >> columns(i.e. a,
> > >> > b, c, …100+ columns). And make sure there is one segment for total
> > >> data. I
> > >> > have open the speculation(quantile=0.5, interval=250,
> multiplier=1.2).
> > >> >
> > >> > Time is mainly spent on first stage before shuffling. As 95% data
> will
> > >> be
> > >> > filtered out, the shuffle process spend little time. In first stage,
> > >> most
> > >> > task complete in less than 10s. But there still be near 50 tasks
> > longer
> > >> > than 10s. Max task time in one query may be 12~16s.
> > >> >
> > >> > *Problem:*
> > >> >
> > >> > 1. GC problem. We suffer a 20%~30% GC time for some task in
> > >> first
> > >> > stage after a lot of parameter refinement. We now use G1 GC in
> java8.
> > GC
> > >> > time will double if use CMS. The main GC time is spent on young
> > >> generation
> > >> > GC. Almost half memory of young generation will be copy to old
> > >> generation.
> > >> > It seems lots of object has a long life than GC period and the space
> > is
> > >> not
> > >> > be reuse(as concurrent GC will release it later). When we use a
> large
> > >> > Eden(>=1G for example), once GC time will be seconds. If set Eden
> > >> > little(256M for example), once GC time will be hundreds
> milliseconds,
> > >> but
> > >> > more frequency and total is still seconds. Is there any way to
> lessen
> > >> the
> > >> > GC time? (We don’t consider the first query and second query in this
> > >> case.)
> > >> >
> > >> > 2. Performance refine problem. Row number after being
> > >> filtered is
> > >> > not uniform. Some node maybe heavy. It spend more time than other
> > node.
> > >> The
> > >> > time of one task is 4s ~ 16s. Is any method to refine it?
> > >> >
> > >> > 3. Too long time for first and second query. I know
> > dictionary
> > >> > and some index need to be loaded for the first time. But after I
> > trying
> > >> use
> > >> > query below to preheat it, it still spend a lot of time. How could I
> > >> > preheat the query correctly?
> > >> >
> > >> > select Aarray, a, b, c… from Table1 where Aarray is
> not
> > >> null
> > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> > >> >
> > >> > 4. Any other suggestion to lessen the query time?
> > >> >
> > >> >
> > >> >
> > >> > Some suggestion：
> > >> >
> > >> > The log by class QueryStatisticsRecorder give me a good
> > >> means
> > >> > to find the neck bottle, but not enough. There still some metric I
> > >> think is
> > >> > very useful:
> > >> >
> > >> > 1. filter ratio. i.e.. not only result_size but also the
> > >> origin
> > >> > size so we could know how many data is filtered.
> > >> >
> > >> > 2. IO time. The scan_blocks_time is not enough. If it is
> > >> high,
> > >> > we know somethings wrong, but not know what cause that problem. The
> > >> real IO
> > >> > time for data is not be provided. As there may be several file for
> one
> > >> > partition, know the program slow is caused by datanode or executor
> > >> itself
> > >> > give us intuition to find the problem.
> > >> >
> > >> > 3. The TableBlockInfo for task. I log it by myself when
> > >> > debugging. It tell me how many blocklets is locality. The spark web
> > >> monitor
> > >> > just give a locality level, but may be only one blocklet is
> locality.
> > >> >
> > >> >
> > >> > ---------------------
> > >> >
> > >> > Anning Luo
> > >> >
> > >> > *HULU*
> > >> >
> > >> > Email: [hidden email]
> > >> >
> > >> > [hidden email]
> > >> >
> > >>
> > >
> > >
> >
>

... [show rest of quote]

... [show rest of quote]

Anning Luo-2

Nov 16, 2016; 2:42am

Re: GC problem and performance refine problem

14 posts

Hi Kumar Vishal,

1. I found the quantity of rows filtered out by invert index is not uniform between different tasks and the difference is large. Some task may be 3~4k row after filtered, but the longer tasks may be 3~4w. When most longer task on same node, time cost will be more longer than others. So, is there any way to balance the data and make rows distribute uniform between task?

Now, the blocklet size is still 120k rows. 3k+ blocket in total. Every task has 6 blocket.

2. I upload some logs last time. Is there some access problem about it? Is there any suggest or question about it?

2016-11-14 11:29 GMT+08:00 An Lan <[hidden email]>:

Hi Kumar Vishal,

Driver and some executor logs are in the accessory. The same query run for five times.

Time consume for every query:

67068ms, 45758ms, 26497ms, 22619ms, 21504ms

The first stage for every query:

(first query -> start from 0 stage

second query -> 4

third query -> 8

fourth query -> 12

five query -> 16)

see "first stage of queries.png" in accessory

Longest task for every query:

(Task and log relationship:

stage 0, task 249 -> 1.log

stage 4, task 420 -> 2.log

stage 8, task 195 -> 3.log

stage 12, task 350 -> 4.log

stage 16, task 321 -> 5.log)
see "longest task.png" in accessory

longest tasks.png

first stage of queries.png

dirverlog

1.log

2.log

3.log

4.log

5.log

2016-11-14 11:22 GMT+08:00 An Lan <[hidden email]>:

Hi,

Driver and some executor logs are in the accessory. The same query run for five times.

Time consume for every query:

67068ms, 45758ms, 26497ms, 22619ms, 21504ms

The first stage for every query:

(first query -> start from 0 stage

second query -> 4

third query -> 8

fourth query -> 12

five query -> 16)

Longest task for every query:

(Task and log relationship:

stage 0, task 249 -> 1.log

stage 4, task 420 -> 2.log

stage 8, task 195 -> 3.log

stage 12, task 350 -> 4.log

stage 16, task 321 -> 5.log)

2016-11-11 20:25 GMT+08:00 Kumar Vishal <[hidden email]>:
What is the time difference between first time and second time query as
second time it will read from os cache,so i think there wont be any IO
bottleneck.

Can u provide driver log and executor log for task which are taking more
time.

-Regards
Kumar Vishal

On Fri, Nov 11, 2016 at 3:21 PM, An Lan <[hidden email]> wrote:

> 1. I have set --num-executors=100 in all test. The image write 100 nodes
> means one executor for one node, and when there is 64 node, there maybe two
> or three executor on one node. The total executor number is always 100.
>
> 2. I do not find the property enable.blocklet.distribution in 0.1.1. I
> found it in 0.2.0. I am testing on 0.1.1, and will try it later. If you
> concern locality level, it seems that the most long query is not caused by
> IO problem. Statistic of the long task statistic like this:
>
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+-------------------+
> | task_id|load_blocks_time|load_dictionary_time|scan_blocks_
> time|scan_blocks_num|result_size|total_executor_time|
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+-------------------+
> |4199754456147478_134| 1 | 32 |
> 3306 | 3 | 42104 | 13029 |
> +--------------------+----------------+--------------------+
> ----------------+---------------+-----------+-------------------+
>
> I have suffer from IO, but after configure speculation, most tasks
> with a long IO will be resend.
>
>
> 3. distinct value:
>
> g: 11
>
> h: 3
>
> f: 7
>
> e: 3
>
> d: 281
>
> 4. In the query, only e, f ,g, h, d and A are used in filter, others are
> not used. So I think others used in the aggregation are no need added for
> index.
>
> 5. I have write the e, f, g, h, d in most left just after "create table..."
>
> 2016-11-11 15:08 GMT+08:00 Kumar Vishal <[hidden email]>:
>
> > Hi An Lan,
> >
> > Please confirm below things.
> >
> > 1. Is dynamic executor is enabled?? If it is enabled can u disabled and
> > try(this is to check is there any impact with dynamic executor)
> > for disabling dynamic executor you need set in spark-default.conf
> > --num-executors=100
> >
> > 2. Can u please set in below property enable.blocklet.distribution=false
> > and execute the query.
> >
> > 3. Cardinality of each column.
> >
> > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??
> >
> > 5. Can u keep all the columns which is present in filter on left side so
> > less no of blocks will identified during block pruning and it will
> improve
> > the query performance.
> >
> >
> > -Regards
> > Kumar Vishal
> >
> > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <[hidden email]> wrote:
> >
> > > Hi Kumar Vishal,
> > >
> > > 1. Create table ddl:
> > >
> > > CREATE TABLE IF NOT EXISTS Table1
> > >
> > > (* h Int, g Int, d String, f Int, e Int,*
> > >
> > > a Int, b Int, …(extra near 300 columns)
> > >
> > > STORED BY 'org.apache.carbondata.format'
> > > TBLPROPERTIES(
> > >
> > > "NO_INVERTED_INDEX”=“a”,
> > > "NO_INVERTED_INDEX”=“b”,
> > >
> > > …(extra near 300 columns)
> > >
> > > "DICTIONARY_INCLUDE”=“a”,
> > >
> > > "DICTIONARY_INCLUDE”=“b”,
> > >
> > > …(extra near 300 columns)
> > >
> > > )
> > >
> > > 2. 3. There more than hundreds node in the cluster, but cluster
> is
> > > used mixed with other application. Some time when node is enough, we
> > will
> > > get 100 distinct node.
> > >
> > > 4. I give a statistic of task time during once query and mark
> > > distinct nodes below:
> > >
> > > [image: 内嵌图片 1]
> > >
> > >
> > >
> > >
> > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <[hidden email]>:
> > >
> > >> Hi Anning Luo,
> > >>
> > >> Can u please provide below details.
> > >>
> > >> 1.Create table ddl.
> > >> 2.Number of node in you cluster setup.
> > >> 3. Number of executors per node.
> > >> 4. Query statistics.
> > >>
> > >> Please find my comments in bold.
> > >>
> > >> Problem:
> > >> 1. GC problem. We suffer a 20%~30% GC time
> for
> > >> some task in first stage after a lot of parameter refinement. We now
> use
> > >> G1
> > >> GC in java8. GC time will double if use CMS. The main GC time is spent
> > on
> > >> young generation GC. Almost half memory of young generation will be
> copy
> > >> to
> > >> old generation. It seems lots of object has a long life than GC period
> > and
> > >> the space is not be reuse(as concurrent GC will release it later).
> When
> > we
> > >> use a large Eden(>=1G for example), once GC time will be seconds. If
> set
> > >> Eden little(256M for example), once GC time will be hundreds
> > milliseconds,
> > >> but more frequency and total is still seconds. Is there any way to
> > lessen
> > >> the GC time? (We don’t consider the first query and second query in
> this
> > >> case.)
> > >>
> > >> *How many node are present in your cluster setup?? If nodes are less
> > >> please
> > >> reduce the number of executors per node.*
> > >>
> > >> 2. Performance refine problem. Row number
> after
> > >> being filtered is not uniform. Some node maybe heavy. It spend more
> time
> > >> than other node. The time of one task is 4s ~ 16s. Is any method to
> > refine
> > >> it?
> > >>
> > >> 3. Too long time for first and second query.
> I
> > >> know dictionary and some index need to be loaded for the first time.
> But
> > >> after I trying use query below to preheat it, it still spend a lot of
> > >> time.
> > >> How could I preheat the query correctly?
> > >> select Aarray, a, b, c… from Table1 where
> Aarray
> > >> is
> > >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> > >>
> > >> *Currently we are working on first time query improvement. For now you
> > can
> > >> run select count(*) or count(column), so all the blocks get loaded and
> > >> then
> > >> you can run the actual query.*
> > >>
> > >>
> > >> 4. Any other suggestion to lessen the query time?
> > >>
> > >>
> > >> Some suggestion：
> > >> The log by class QueryStatisticsRecorder give me a good
> > means
> > >> to find the neck bottle, but not enough. There still some metric I
> think
> > >> is
> > >> very useful:
> > >> 1. filter ratio. i.e.. not only result_size but also the
> > >> origin
> > >> size so we could know how many data is filtered.
> > >> 2. IO time. The scan_blocks_time is not enough. If it is
> > high,
> > >> we know somethings wrong, but not know what cause that problem. The
> real
> > >> IO
> > >> time for data is not be provided. As there may be several file for one
> > >> partition, know the program slow is caused by datanode or executor
> > itself
> > >> give us intuition to find the problem.
> > >> 3. The TableBlockInfo for task. I log it by myself when
> > >> debugging. It tell me how many blocklets is locality. The spark web
> > >> monitor
> > >> just give a locality level, but may be only one blocklet is locality.
> > >>
> > >>
> > >> -Regards
> > >> Kumar Vishal
> > >>
> > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > We are using carbondata to build our table and running query in
> > >> > CarbonContext. We have some performance problem during refining the
> > >> system.
> > >> >
> > >> > *Background*：
> > >> >
> > >> > *cluster*： 100 executor，5 task/executor, 10G
> > >> > memory/executor
> > >> >
> > >> > *data*: 60+GB(per one replica) as
> carbon
> > >> data
> > >> > format, 600+MB/file * 100 file, 300+columns, 300+million rows
> > >> >
> > >> > *sql example:*
> > >> >
> > >> > select A,
> > >> >
> > >> > sum(a),
> > >> >
> > >> > sum(b),
> > >> >
> > >> > sum(c),
> > >> >
> > >> > …( extra 100 aggregation like
> > >> > sum(column))
> > >> >
> > >> > from Table1 LATERAL VIEW
> > >> > explode(split(Aarray, ‘*;*’)) ATable AS A
> > >> >
> > >> > where A is not null and d >
> > >> “ab:c-10”
> > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A
> > >> >
> > >> > *target query time*: <10s
> > >> >
> > >> > *current query time*: 15s ~ 25s
> > >> >
> > >> > *scene:* OLAP system. <100 queries every
> > >> day.
> > >> > Concurrency number is not high. Most time cpu is idle, so this
> service
> > >> will
> > >> > run with other program. The service will run for long time. We could
> > not
> > >> > occupy a very large memory for every executor.
> > >> >
> > >> > *refine*: I have build index and
> > >> dictionary on
> > >> > d, e, f, g, h and build dictionary on all other aggregation
> > >> columns(i.e. a,
> > >> > b, c, …100+ columns). And make sure there is one segment for total
> > >> data. I
> > >> > have open the speculation(quantile=0.5, interval=250,
> multiplier=1.2).
> > >> >
> > >> > Time is mainly spent on first stage before shuffling. As 95% data
> will
> > >> be
> > >> > filtered out, the shuffle process spend little time. In first stage,
> > >> most
> > >> > task complete in less than 10s. But there still be near 50 tasks
> > longer
> > >> > than 10s. Max task time in one query may be 12~16s.
> > >> >
> > >> > *Problem:*
> > >> >
> > >> > 1. GC problem. We suffer a 20%~30% GC time for some task in
> > >> first
> > >> > stage after a lot of parameter refinement. We now use G1 GC in
> java8.
> > GC
> > >> > time will double if use CMS. The main GC time is spent on young
> > >> generation
> > >> > GC. Almost half memory of young generation will be copy to old
> > >> generation.
> > >> > It seems lots of object has a long life than GC period and the space
> > is
> > >> not
> > >> > be reuse(as concurrent GC will release it later). When we use a
> large
> > >> > Eden(>=1G for example), once GC time will be seconds. If set Eden
> > >> > little(256M for example), once GC time will be hundreds
> milliseconds,
> > >> but
> > >> > more frequency and total is still seconds. Is there any way to
> lessen
> > >> the
> > >> > GC time? (We don’t consider the first query and second query in this
> > >> case.)
> > >> >
> > >> > 2. Performance refine problem. Row number after being
> > >> filtered is
> > >> > not uniform. Some node maybe heavy. It spend more time than other
> > node.
> > >> The
> > >> > time of one task is 4s ~ 16s. Is any method to refine it?
> > >> >
> > >> > 3. Too long time for first and second query. I know
> > dictionary
> > >> > and some index need to be loaded for the first time. But after I
> > trying
> > >> use
> > >> > query below to preheat it, it still spend a lot of time. How could I
> > >> > preheat the query correctly?
> > >> >
> > >> > select Aarray, a, b, c… from Table1 where Aarray is
> not
> > >> null
> > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> > >> >
> > >> > 4. Any other suggestion to lessen the query time?
> > >> >
> > >> >
> > >> >
> > >> > Some suggestion：
> > >> >
> > >> > The log by class QueryStatisticsRecorder give me a good
> > >> means
> > >> > to find the neck bottle, but not enough. There still some metric I
> > >> think is
> > >> > very useful:
> > >> >
> > >> > 1. filter ratio. i.e.. not only result_size but also the
> > >> origin
> > >> > size so we could know how many data is filtered.
> > >> >
> > >> > 2. IO time. The scan_blocks_time is not enough. If it is
> > >> high,
> > >> > we know somethings wrong, but not know what cause that problem. The
> > >> real IO
> > >> > time for data is not be provided. As there may be several file for
> one
> > >> > partition, know the program slow is caused by datanode or executor
> > >> itself
> > >> > give us intuition to find the problem.
> > >> >
> > >> > 3. The TableBlockInfo for task. I log it by myself when
> > >> > debugging. It tell me how many blocklets is locality. The spark web
> > >> monitor
> > >> > just give a locality level, but may be only one blocklet is
> locality.
> > >> >
> > >> >
> > >> > ---------------------
> > >> >
> > >> > Anning Luo
> > >> >
> > >> > *HULU*
> > >> >
> > >> > Email: [hidden email]
> > >> >
> > >> > [hidden email]
> > >> >
> > >>
> > >
> > >
> >
>

... [show rest of quote]

... [show rest of quote]

... [show rest of quote]

kumarvishal09

Nov 16, 2016; 7:14am

Re: GC problem and performance refine problem

142 posts

Hi An Lan,
Data is already distributed, in this case may be one
blocklet is returning more number of rows and other returning less because
of this some task will take more time.

In driver log block distribution log is not present, so it is not clear
whether it is going for block distribution or blocklet distribution.
distribution, can you please add the detail driver log.

*Some suggestion:*

1. If column is String and you are not applying filter then in create
statement add it in dictionary exclude this will avoid lookup in
dictionary. If number of String column are less then better add it in
dictionary exclude.

2. If column is a numeric column and you are not applying filter then no
need to add in dictionary include or exclude, so it will be a measure
column.

3. Currently in carbon for measure column if you will create as a double
data type it will give more compression as currently value compression mode
is support for double data type.

-Regards
Kumar Vishal

On Wed, Nov 16, 2016 at 8:12 AM, An Lan <[hidden email]> wrote:

> Hi Kumar Vishal,
>
> 1. I found the quantity of rows filtered out by invert index is not
> uniform between different tasks and the difference is large. Some task may
> be 3~4k row after filtered, but the longer tasks may be 3~4w. When most
> longer task on same node, time cost will be more longer than others. So, is
> there any way to balance the data and make rows distribute uniform between
> task?
> Now, the blocklet size is still 120k rows. 3k+ blocket in total. Every
> task has 6 blocket.
> 2. I upload some logs last time. Is there some access problem about it? Is
> there any suggest or question about it?
>
> 2016-11-14 11:29 GMT+08:00 An Lan <[hidden email]>:
>
>> Hi Kumar Vishal,
>>
>>
>> Driver and some executor logs are in the accessory. The same query run
>> for five times.
>>
>>
>>
>> Time consume for every query:
>>
>> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
>>
>>
>> The first stage for every query:
>>
>> (first query -> start from 0 stage
>>
>> second query -> 4
>>
>> third query -> 8
>>
>> fourth query -> 12
>>
>> five query -> 16)
>>
>> see "first stage of queries.png" in accessory
>>
>>
>>
>> Longest task for every query:
>>
>> (Task and log relationship:
>>
>> stage 0, task 249 -> 1.log
>>
>> stage 4, task 420 -> 2.log
>>
>> stage 8, task 195 -> 3.log
>>
>> stage 12, task 350 -> 4.log
>>
>> stage 16, task 321 -> 5.log)
>> see "longest task.png" in accessory
>>
>>
>> longest tasks.png
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7dGY4cF9jRDk1QTg/view?usp=drive_web>
>>
>> first stage of queries.png
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7SVZsNnR2VEw2X0k/view?usp=drive_web>
>>
>> dirverlog
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7VllnVkNnenhyTTA/view?usp=drive_web>
>>
>> 1.log
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7eUdUWmlia1J4bk0/view?usp=drive_web>
>>
>> 2.log
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7ZUlGZHZVQ3phdmM/view?usp=drive_web>
>>
>> 3.log
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7UHpBQzREX3N5aEk/view?usp=drive_web>
>>
>> 4.log
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7NVctMmYwNldCVEk/view?usp=drive_web>
>>
>> 5.log
>> <https://drive.google.com/file/d/0B1XM6KeI1nB7ODhHNE5sSGNfOVE/view?usp=drive_web>
>>
>>
>> 2016-11-14 11:22 GMT+08:00 An Lan <[hidden email]>:
>>
>>> Hi,
>>>
>>>
>>>
>>> Driver and some executor logs are in the accessory. The same query run
>>> for five times.
>>>
>>>
>>>
>>> Time consume for every query:
>>>
>>> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
>>>
>>>
>>>
>>> The first stage for every query:
>>>
>>> (first query -> start from 0 stage
>>>
>>> second query -> 4
>>>
>>> third query -> 8
>>>
>>> fourth query -> 12
>>>
>>> five query -> 16)
>>>
>>>
>>> Longest task for every query:
>>>
>>> (Task and log relationship:
>>>
>>> stage 0, task 249 -> 1.log
>>>
>>> stage 4, task 420 -> 2.log
>>>
>>> stage 8, task 195 -> 3.log
>>>
>>> stage 12, task 350 -> 4.log
>>>
>>> stage 16, task 321 -> 5.log)
>>>
>>>
>>>
>>> 2016-11-11 20:25 GMT+08:00 Kumar Vishal <[hidden email]>:
>>>
>>>> What is the time difference between first time and second time query as
>>>> second time it will read from os cache,so i think there wont be any IO
>>>> bottleneck.
>>>>
>>>> Can u provide driver log and executor log for task which are taking more
>>>> time.
>>>>
>>>>
>>>> -Regards
>>>> Kumar Vishal
>>>>
>>>> On Fri, Nov 11, 2016 at 3:21 PM, An Lan <[hidden email]> wrote:
>>>>
>>>> > 1. I have set --num-executors=100 in all test. The image write 100
>>>> nodes
>>>> > means one executor for one node, and when there is 64 node, there
>>>> maybe two
>>>> > or three executor on one node. The total executor number is always
>>>> 100.
>>>> >
>>>> > 2. I do not find the property enable.blocklet.distribution in 0.1.1. I
>>>> > found it in 0.2.0. I am testing on 0.1.1, and will try it later. If
>>>> you
>>>> > concern locality level, it seems that the most long query is not
>>>> caused by
>>>> > IO problem. Statistic of the long task statistic like this:
>>>> >
>>>> > +--------------------+----------------+--------------------+
>>>> > ----------------+---------------+-----------+-------------------+
>>>> > | task_id|load_blocks_time|load
>>>> _dictionary_time|scan_blocks_
>>>> > time|scan_blocks_num|result_size|total_executor_time|
>>>> > +--------------------+----------------+--------------------+
>>>> > ----------------+---------------+-----------+-------------------+
>>>> > |4199754456147478_134| 1 | 32 |
>>>> > 3306 | 3 | 42104 | 13029 |
>>>> > +--------------------+----------------+--------------------+
>>>> > ----------------+---------------+-----------+-------------------+
>>>> >
>>>> > I have suffer from IO, but after configure speculation, most tasks
>>>> > with a long IO will be resend.
>>>> >
>>>> >
>>>> > 3. distinct value:
>>>> >
>>>> > g: 11
>>>> >
>>>> > h: 3
>>>> >
>>>> > f: 7
>>>> >
>>>> > e: 3
>>>> >
>>>> > d: 281
>>>> >
>>>> > 4. In the query, only e, f ,g, h, d and A are used in filter, others
>>>> are
>>>> > not used. So I think others used in the aggregation are no need added
>>>> for
>>>> > index.
>>>> >
>>>> > 5. I have write the e, f, g, h, d in most left just after "create
>>>> table..."
>>>> >
>>>> > 2016-11-11 15:08 GMT+08:00 Kumar Vishal <[hidden email]>:
>>>> >
>>>> > > Hi An Lan,
>>>> > >
>>>> > > Please confirm below things.
>>>> > >
>>>> > > 1. Is dynamic executor is enabled?? If it is enabled can u disabled
>>>> and
>>>> > > try(this is to check is there any impact with dynamic executor)
>>>> > > for disabling dynamic executor you need set in
>>>> spark-default.conf
>>>> > > --num-executors=100
>>>> > >
>>>> > > 2. Can u please set in below property enable.blocklet.distribution=f
>>>> alse
>>>> > > and execute the query.
>>>> > >
>>>> > > 3. Cardinality of each column.
>>>> > >
>>>> > > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??
>>>> > >
>>>> > > 5. Can u keep all the columns which is present in filter on left
>>>> side so
>>>> > > less no of blocks will identified during block pruning and it will
>>>> > improve
>>>> > > the query performance.
>>>> > >
>>>> > >
>>>> > > -Regards
>>>> > > Kumar Vishal
>>>> > >
>>>> > > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <[hidden email]>
>>>> wrote:
>>>> > >
>>>> > > > Hi Kumar Vishal,
>>>> > > >
>>>> > > > 1. Create table ddl:
>>>> > > >
>>>> > > > CREATE TABLE IF NOT EXISTS Table1
>>>> > > >
>>>> > > > (* h Int, g Int, d String, f Int, e Int,*
>>>> > > >
>>>> > > > a Int, b Int, …(extra near 300 columns)
>>>> > > >
>>>> > > > STORED BY 'org.apache.carbondata.format'
>>>> > > > TBLPROPERTIES(
>>>> > > >
>>>> > > > "NO_INVERTED_INDEX”=“a”,
>>>> > > > "NO_INVERTED_INDEX”=“b”,
>>>> > > >
>>>> > > > …(extra near 300 columns)
>>>> > > >
>>>> > > > "DICTIONARY_INCLUDE”=“a”,
>>>> > > >
>>>> > > > "DICTIONARY_INCLUDE”=“b”,
>>>> > > >
>>>> > > > …(extra near 300 columns)
>>>> > > >
>>>> > > > )
>>>> > > >
>>>> > > > 2. 3. There more than hundreds node in the cluster, but
>>>> cluster
>>>> > is
>>>> > > > used mixed with other application. Some time when node is
>>>> enough, we
>>>> > > will
>>>> > > > get 100 distinct node.
>>>> > > >
>>>> > > > 4. I give a statistic of task time during once query and mark
>>>> > > > distinct nodes below:
>>>> > > >
>>>> > > > [image: 内嵌图片 1]
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <
>>>> [hidden email]>:
>>>> > > >
>>>> > > >> Hi Anning Luo,
>>>> > > >>
>>>> > > >> Can u please provide below details.
>>>> > > >>
>>>> > > >> 1.Create table ddl.
>>>> > > >> 2.Number of node in you cluster setup.
>>>> > > >> 3. Number of executors per node.
>>>> > > >> 4. Query statistics.
>>>> > > >>
>>>> > > >> Please find my comments in bold.
>>>> > > >>
>>>> > > >> Problem:
>>>> > > >> 1. GC problem. We suffer a 20%~30% GC
>>>> time
>>>> > for
>>>> > > >> some task in first stage after a lot of parameter refinement. We
>>>> now
>>>> > use
>>>> > > >> G1
>>>> > > >> GC in java8. GC time will double if use CMS. The main GC time is
>>>> spent
>>>> > > on
>>>> > > >> young generation GC. Almost half memory of young generation will
>>>> be
>>>> > copy
>>>> > > >> to
>>>> > > >> old generation. It seems lots of object has a long life than GC
>>>> period
>>>> > > and
>>>> > > >> the space is not be reuse(as concurrent GC will release it
>>>> later).
>>>> > When
>>>> > > we
>>>> > > >> use a large Eden(>=1G for example), once GC time will be
>>>> seconds. If
>>>> > set
>>>> > > >> Eden little(256M for example), once GC time will be hundreds
>>>> > > milliseconds,
>>>> > > >> but more frequency and total is still seconds. Is there any way
>>>> to
>>>> > > lessen
>>>> > > >> the GC time? (We don’t consider the first query and second query
>>>> in
>>>> > this
>>>> > > >> case.)
>>>> > > >>
>>>> > > >> *How many node are present in your cluster setup?? If nodes are
>>>> less
>>>> > > >> please
>>>> > > >> reduce the number of executors per node.*
>>>> > > >>
>>>> > > >> 2. Performance refine problem. Row
>>>> number
>>>> > after
>>>> > > >> being filtered is not uniform. Some node maybe heavy. It spend
>>>> more
>>>> > time
>>>> > > >> than other node. The time of one task is 4s ~ 16s. Is any method
>>>> to
>>>> > > refine
>>>> > > >> it?
>>>> > > >>
>>>> > > >> 3. Too long time for first and second
>>>> query.
>>>> > I
>>>> > > >> know dictionary and some index need to be loaded for the first
>>>> time.
>>>> > But
>>>> > > >> after I trying use query below to preheat it, it still spend a
>>>> lot of
>>>> > > >> time.
>>>> > > >> How could I preheat the query correctly?
>>>> > > >> select Aarray, a, b, c… from Table1 where
>>>> > Aarray
>>>> > > >> is
>>>> > > >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and h =
>>>> 55
>>>> > > >>
>>>> > > >> *Currently we are working on first time query improvement. For
>>>> now you
>>>> > > can
>>>> > > >> run select count(*) or count(column), so all the blocks get
>>>> loaded and
>>>> > > >> then
>>>> > > >> you can run the actual query.*
>>>> > > >>
>>>> > > >>
>>>> > > >> 4. Any other suggestion to lessen the query time?
>>>> > > >>
>>>> > > >>
>>>> > > >> Some suggestion：
>>>> > > >> The log by class QueryStatisticsRecorder give me a
>>>> good
>>>> > > means
>>>> > > >> to find the neck bottle, but not enough. There still some metric
>>>> I
>>>> > think
>>>> > > >> is
>>>> > > >> very useful:
>>>> > > >> 1. filter ratio. i.e.. not only result_size but also
>>>> the
>>>> > > >> origin
>>>> > > >> size so we could know how many data is filtered.
>>>> > > >> 2. IO time. The scan_blocks_time is not enough. If
>>>> it is
>>>> > > high,
>>>> > > >> we know somethings wrong, but not know what cause that problem.
>>>> The
>>>> > real
>>>> > > >> IO
>>>> > > >> time for data is not be provided. As there may be several file
>>>> for one
>>>> > > >> partition, know the program slow is caused by datanode or
>>>> executor
>>>> > > itself
>>>> > > >> give us intuition to find the problem.
>>>> > > >> 3. The TableBlockInfo for task. I log it by myself
>>>> when
>>>> > > >> debugging. It tell me how many blocklets is locality. The spark
>>>> web
>>>> > > >> monitor
>>>> > > >> just give a locality level, but may be only one blocklet is
>>>> locality.
>>>> > > >>
>>>> > > >>
>>>> > > >> -Regards
>>>> > > >> Kumar Vishal
>>>> > > >>
>>>> > > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]>
>>>> wrote:
>>>> > > >>
>>>> > > >> > Hi,
>>>> > > >> >
>>>> > > >> > We are using carbondata to build our table and running query in
>>>> > > >> > CarbonContext. We have some performance problem during
>>>> refining the
>>>> > > >> system.
>>>> > > >> >
>>>> > > >> > *Background*：
>>>> > > >> >
>>>> > > >> > *cluster*： 100 executor，5 task/executor,
>>>> 10G
>>>> > > >> > memory/executor
>>>> > > >> >
>>>> > > >> > *data*: 60+GB(per one replica) as
>>>> > carbon
>>>> > > >> data
>>>> > > >> > format, 600+MB/file * 100 file, 300+columns, 300+million rows
>>>> > > >> >
>>>> > > >> > *sql example:*
>>>> > > >> >
>>>> > > >> > select A,
>>>> > > >> >
>>>> > > >> > sum(a),
>>>> > > >> >
>>>> > > >> > sum(b),
>>>> > > >> >
>>>> > > >> > sum(c),
>>>> > > >> >
>>>> > > >> > …( extra 100 aggregation
>>>> like
>>>> > > >> > sum(column))
>>>> > > >> >
>>>> > > >> > from Table1 LATERAL VIEW
>>>> > > >> > explode(split(Aarray, ‘*;*’)) ATable AS A
>>>> > > >> >
>>>> > > >> > where A is not null and
>>>> d >
>>>> > > >> “ab:c-10”
>>>> > > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY
>>>> A
>>>> > > >> >
>>>> > > >> > *target query time*: <10s
>>>> > > >> >
>>>> > > >> > *current query time*: 15s ~ 25s
>>>> > > >> >
>>>> > > >> > *scene:* OLAP system. <100 queries
>>>> every
>>>> > > >> day.
>>>> > > >> > Concurrency number is not high. Most time cpu is idle, so this
>>>> > service
>>>> > > >> will
>>>> > > >> > run with other program. The service will run for long time. We
>>>> could
>>>> > > not
>>>> > > >> > occupy a very large memory for every executor.
>>>> > > >> >
>>>> > > >> > *refine*: I have build index and
>>>> > > >> dictionary on
>>>> > > >> > d, e, f, g, h and build dictionary on all other aggregation
>>>> > > >> columns(i.e. a,
>>>> > > >> > b, c, …100+ columns). And make sure there is one segment for
>>>> total
>>>> > > >> data. I
>>>> > > >> > have open the speculation(quantile=0.5, interval=250,
>>>> > multiplier=1.2).
>>>> > > >> >
>>>> > > >> > Time is mainly spent on first stage before shuffling. As 95%
>>>> data
>>>> > will
>>>> > > >> be
>>>> > > >> > filtered out, the shuffle process spend little time. In first
>>>> stage,
>>>> > > >> most
>>>> > > >> > task complete in less than 10s. But there still be near 50
>>>> tasks
>>>> > > longer
>>>> > > >> > than 10s. Max task time in one query may be 12~16s.
>>>> > > >> >
>>>> > > >> > *Problem:*
>>>> > > >> >
>>>> > > >> > 1. GC problem. We suffer a 20%~30% GC time for some
>>>> task in
>>>> > > >> first
>>>> > > >> > stage after a lot of parameter refinement. We now use G1 GC in
>>>> > java8.
>>>> > > GC
>>>> > > >> > time will double if use CMS. The main GC time is spent on young
>>>> > > >> generation
>>>> > > >> > GC. Almost half memory of young generation will be copy to old
>>>> > > >> generation.
>>>> > > >> > It seems lots of object has a long life than GC period and the
>>>> space
>>>> > > is
>>>> > > >> not
>>>> > > >> > be reuse(as concurrent GC will release it later). When we use a
>>>> > large
>>>> > > >> > Eden(>=1G for example), once GC time will be seconds. If set
>>>> Eden
>>>> > > >> > little(256M for example), once GC time will be hundreds
>>>> > milliseconds,
>>>> > > >> but
>>>> > > >> > more frequency and total is still seconds. Is there any way to
>>>> > lessen
>>>> > > >> the
>>>> > > >> > GC time? (We don’t consider the first query and second query
>>>> in this
>>>> > > >> case.)
>>>> > > >> >
>>>> > > >> > 2. Performance refine problem. Row number after being
>>>> > > >> filtered is
>>>> > > >> > not uniform. Some node maybe heavy. It spend more time than
>>>> other
>>>> > > node.
>>>> > > >> The
>>>> > > >> > time of one task is 4s ~ 16s. Is any method to refine it?
>>>> > > >> >
>>>> > > >> > 3. Too long time for first and second query. I know
>>>> > > dictionary
>>>> > > >> > and some index need to be loaded for the first time. But after
>>>> I
>>>> > > trying
>>>> > > >> use
>>>> > > >> > query below to preheat it, it still spend a lot of time. How
>>>> could I
>>>> > > >> > preheat the query correctly?
>>>> > > >> >
>>>> > > >> > select Aarray, a, b, c… from Table1 where Aarray
>>>> is
>>>> > not
>>>> > > >> null
>>>> > > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
>>>> > > >> >
>>>> > > >> > 4. Any other suggestion to lessen the query time?
>>>> > > >> >
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Some suggestion：
>>>> > > >> >
>>>> > > >> > The log by class QueryStatisticsRecorder give me a
>>>> good
>>>> > > >> means
>>>> > > >> > to find the neck bottle, but not enough. There still some
>>>> metric I
>>>> > > >> think is
>>>> > > >> > very useful:
>>>> > > >> >
>>>> > > >> > 1. filter ratio. i.e.. not only result_size but
>>>> also the
>>>> > > >> origin
>>>> > > >> > size so we could know how many data is filtered.
>>>> > > >> >
>>>> > > >> > 2. IO time. The scan_blocks_time is not enough. If
>>>> it is
>>>> > > >> high,
>>>> > > >> > we know somethings wrong, but not know what cause that
>>>> problem. The
>>>> > > >> real IO
>>>> > > >> > time for data is not be provided. As there may be several file
>>>> for
>>>> > one
>>>> > > >> > partition, know the program slow is caused by datanode or
>>>> executor
>>>> > > >> itself
>>>> > > >> > give us intuition to find the problem.
>>>> > > >> >
>>>> > > >> > 3. The TableBlockInfo for task. I log it by myself
>>>> when
>>>> > > >> > debugging. It tell me how many blocklets is locality. The
>>>> spark web
>>>> > > >> monitor
>>>> > > >> > just give a locality level, but may be only one blocklet is
>>>> > locality.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > ---------------------
>>>> > > >> >
>>>> > > >> > Anning Luo
>>>> > > >> >
>>>> > > >> > *HULU*
>>>> > > >> >
>>>> > > >> > Email: [hidden email]
>>>> > > >> >
>>>> > > >> > [hidden email]
>>>> > > >> >
>>>> > > >>
>>>> > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>

... [show rest of quote]

kumar vishal

Anning Luo-2

Nov 16, 2016; 8:36am

Re: GC problem and performance refine problem

14 posts

Hi Kumar Vishal,

Thanks for your suggestion.
The driver log not contain block distribution log by default. How could I
open it?
And how does the order of the dimensions be decided?

2016-11-16 15:14 GMT+08:00 Kumar Vishal <[hidden email]>:

> Hi An Lan,
> Data is already distributed, in this case may be one
> blocklet is returning more number of rows and other returning less because
> of this some task will take more time.
>
> In driver log block distribution log is not present, so it is not clear
> whether it is going for block distribution or blocklet distribution.
> distribution, can you please add the detail driver log.
>
> *Some suggestion:*
>
> 1. If column is String and you are not applying filter then in create
> statement add it in dictionary exclude this will avoid lookup in
> dictionary. If number of String column are less then better add it in
> dictionary exclude.
>
> 2. If column is a numeric column and you are not applying filter then no
> need to add in dictionary include or exclude, so it will be a measure
> column.
>
> 3. Currently in carbon for measure column if you will create as a double
> data type it will give more compression as currently value compression mode
> is support for double data type.
>
> -Regards
> Kumar Vishal
>
>
> On Wed, Nov 16, 2016 at 8:12 AM, An Lan <[hidden email]> wrote:
>
> > Hi Kumar Vishal,
> >
> > 1. I found the quantity of rows filtered out by invert index is not
> > uniform between different tasks and the difference is large. Some task
> may
> > be 3~4k row after filtered, but the longer tasks may be 3~4w. When most
> > longer task on same node, time cost will be more longer than others. So,
> is
> > there any way to balance the data and make rows distribute uniform
> between
> > task?
> > Now, the blocklet size is still 120k rows. 3k+ blocket in total. Every
> > task has 6 blocket.
> > 2. I upload some logs last time. Is there some access problem about it?
> Is
> > there any suggest or question about it?
> >
> > 2016-11-14 11:29 GMT+08:00 An Lan <[hidden email]>:
> >
> >> Hi Kumar Vishal,
> >>
> >>
> >> Driver and some executor logs are in the accessory. The same query run
> >> for five times.
> >>
> >>
> >>
> >> Time consume for every query:
> >>
> >> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
> >>
> >>
> >> The first stage for every query:
> >>
> >> (first query -> start from 0 stage
> >>
> >> second query -> 4
> >>
> >> third query -> 8
> >>
> >> fourth query -> 12
> >>
> >> five query -> 16)
> >>
> >> see "first stage of queries.png" in accessory
> >>
> >>
> >>
> >> Longest task for every query:
> >>
> >> (Task and log relationship:
> >>
> >> stage 0, task 249 -> 1.log
> >>
> >> stage 4, task 420 -> 2.log
> >>
> >> stage 8, task 195 -> 3.log
> >>
> >> stage 12, task 350 -> 4.log
> >>
> >> stage 16, task 321 -> 5.log)
> >> see "longest task.png" in accessory
> >>
> >>
> >> longest tasks.png
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7dGY4cF9jRDk1QTg/
> view?usp=drive_web>
> >>
> >> first stage of queries.png
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7SVZsNnR2VEw2X0k/
> view?usp=drive_web>
> >>
> >> dirverlog
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7VllnVkNnenhyTTA/
> view?usp=drive_web>
> >>
> >> 1.log
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7eUdUWmlia1J4bk0/
> view?usp=drive_web>
> >>
> >> 2.log
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ZUlGZHZVQ3phdmM/
> view?usp=drive_web>
> >>
> >> 3.log
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7UHpBQzREX3N5aEk/
> view?usp=drive_web>
> >>
> >> 4.log
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7NVctMmYwNldCVEk/
> view?usp=drive_web>
> >>
> >> 5.log
> >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ODhHNE5sSGNfOVE/
> view?usp=drive_web>
> >>
> >>
> >> 2016-11-14 11:22 GMT+08:00 An Lan <[hidden email]>:
> >>
> >>> Hi,
> >>>
> >>>
> >>>
> >>> Driver and some executor logs are in the accessory. The same query run
> >>> for five times.
> >>>
> >>>
> >>>
> >>> Time consume for every query:
> >>>
> >>> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
> >>>
> >>>
> >>>
> >>> The first stage for every query:
> >>>
> >>> (first query -> start from 0 stage
> >>>
> >>> second query -> 4
> >>>
> >>> third query -> 8
> >>>
> >>> fourth query -> 12
> >>>
> >>> five query -> 16)
> >>>
> >>>
> >>> Longest task for every query:
> >>>
> >>> (Task and log relationship:
> >>>
> >>> stage 0, task 249 -> 1.log
> >>>
> >>> stage 4, task 420 -> 2.log
> >>>
> >>> stage 8, task 195 -> 3.log
> >>>
> >>> stage 12, task 350 -> 4.log
> >>>
> >>> stage 16, task 321 -> 5.log)
> >>>
> >>>
> >>>
> >>> 2016-11-11 20:25 GMT+08:00 Kumar Vishal <[hidden email]>:
> >>>
> >>>> What is the time difference between first time and second time query
> as
> >>>> second time it will read from os cache,so i think there wont be any IO
> >>>> bottleneck.
> >>>>
> >>>> Can u provide driver log and executor log for task which are taking
> more
> >>>> time.
> >>>>
> >>>>
> >>>> -Regards
> >>>> Kumar Vishal
> >>>>
> >>>> On Fri, Nov 11, 2016 at 3:21 PM, An Lan <[hidden email]> wrote:
> >>>>
> >>>> > 1. I have set --num-executors=100 in all test. The image write 100
> >>>> nodes
> >>>> > means one executor for one node, and when there is 64 node, there
> >>>> maybe two
> >>>> > or three executor on one node. The total executor number is always
> >>>> 100.
> >>>> >
> >>>> > 2. I do not find the property enable.blocklet.distribution in
> 0.1.1. I
> >>>> > found it in 0.2.0. I am testing on 0.1.1, and will try it later. If
> >>>> you
> >>>> > concern locality level, it seems that the most long query is not
> >>>> caused by
> >>>> > IO problem. Statistic of the long task statistic like this:
> >>>> >
> >>>> > +--------------------+----------------+--------------------+
> >>>> > ----------------+---------------+-----------+-------------------+
> >>>> > | task_id|load_blocks_time|load
> >>>> _dictionary_time|scan_blocks_
> >>>> > time|scan_blocks_num|result_size|total_executor_time|
> >>>> > +--------------------+----------------+--------------------+
> >>>> > ----------------+---------------+-----------+-------------------+
> >>>> > |4199754456147478_134| 1 | 32 |
> >>>> > 3306 | 3 | 42104 | 13029 |
> >>>> > +--------------------+----------------+--------------------+
> >>>> > ----------------+---------------+-----------+-------------------+
> >>>> >
> >>>> > I have suffer from IO, but after configure speculation, most tasks
> >>>> > with a long IO will be resend.
> >>>> >
> >>>> >
> >>>> > 3. distinct value:
> >>>> >
> >>>> > g: 11
> >>>> >
> >>>> > h: 3
> >>>> >
> >>>> > f: 7
> >>>> >
> >>>> > e: 3
> >>>> >
> >>>> > d: 281
> >>>> >
> >>>> > 4. In the query, only e, f ,g, h, d and A are used in filter, others
> >>>> are
> >>>> > not used. So I think others used in the aggregation are no need
> added
> >>>> for
> >>>> > index.
> >>>> >
> >>>> > 5. I have write the e, f, g, h, d in most left just after "create
> >>>> table..."
> >>>> >
> >>>> > 2016-11-11 15:08 GMT+08:00 Kumar Vishal <[hidden email]
> >:
> >>>> >
> >>>> > > Hi An Lan,
> >>>> > >
> >>>> > > Please confirm below things.
> >>>> > >
> >>>> > > 1. Is dynamic executor is enabled?? If it is enabled can u
> disabled
> >>>> and
> >>>> > > try(this is to check is there any impact with dynamic executor)
> >>>> > > for disabling dynamic executor you need set in
> >>>> spark-default.conf
> >>>> > > --num-executors=100
> >>>> > >
> >>>> > > 2. Can u please set in below property
> enable.blocklet.distribution=f
> >>>> alse
> >>>> > > and execute the query.
> >>>> > >
> >>>> > > 3. Cardinality of each column.
> >>>> > >
> >>>> > > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??
> >>>> > >
> >>>> > > 5. Can u keep all the columns which is present in filter on left
> >>>> side so
> >>>> > > less no of blocks will identified during block pruning and it will
> >>>> > improve
> >>>> > > the query performance.
> >>>> > >
> >>>> > >
> >>>> > > -Regards
> >>>> > > Kumar Vishal
> >>>> > >
> >>>> > > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <[hidden email]>
> >>>> wrote:
> >>>> > >
> >>>> > > > Hi Kumar Vishal,
> >>>> > > >
> >>>> > > > 1. Create table ddl:
> >>>> > > >
> >>>> > > > CREATE TABLE IF NOT EXISTS Table1
> >>>> > > >
> >>>> > > > (* h Int, g Int, d String, f Int, e Int,*
> >>>> > > >
> >>>> > > > a Int, b Int, …(extra near 300 columns)
> >>>> > > >
> >>>> > > > STORED BY 'org.apache.carbondata.format'
> >>>> > > > TBLPROPERTIES(
> >>>> > > >
> >>>> > > > "NO_INVERTED_INDEX”=“a”,
> >>>> > > > "NO_INVERTED_INDEX”=“b”,
> >>>> > > >
> >>>> > > > …(extra near 300 columns)
> >>>> > > >
> >>>> > > > "DICTIONARY_INCLUDE”=“a”,
> >>>> > > >
> >>>> > > > "DICTIONARY_INCLUDE”=“b”,
> >>>> > > >
> >>>> > > > …(extra near 300 columns)
> >>>> > > >
> >>>> > > > )
> >>>> > > >
> >>>> > > > 2. 3. There more than hundreds node in the cluster, but
> >>>> cluster
> >>>> > is
> >>>> > > > used mixed with other application. Some time when node is
> >>>> enough, we
> >>>> > > will
> >>>> > > > get 100 distinct node.
> >>>> > > >
> >>>> > > > 4. I give a statistic of task time during once query and
> mark
> >>>> > > > distinct nodes below:
> >>>> > > >
> >>>> > > > [image: 内嵌图片 1]
> >>>> > > >
> >>>> > > >
> >>>> > > >
> >>>> > > >
> >>>> > > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <
> >>>> [hidden email]>:
> >>>> > > >
> >>>> > > >> Hi Anning Luo,
> >>>> > > >>
> >>>> > > >> Can u please provide below details.
> >>>> > > >>
> >>>> > > >> 1.Create table ddl.
> >>>> > > >> 2.Number of node in you cluster setup.
> >>>> > > >> 3. Number of executors per node.
> >>>> > > >> 4. Query statistics.
> >>>> > > >>
> >>>> > > >> Please find my comments in bold.
> >>>> > > >>
> >>>> > > >> Problem:
> >>>> > > >> 1. GC problem. We suffer a 20%~30% GC
> >>>> time
> >>>> > for
> >>>> > > >> some task in first stage after a lot of parameter refinement.
> We
> >>>> now
> >>>> > use
> >>>> > > >> G1
> >>>> > > >> GC in java8. GC time will double if use CMS. The main GC time
> is
> >>>> spent
> >>>> > > on
> >>>> > > >> young generation GC. Almost half memory of young generation
> will
> >>>> be
> >>>> > copy
> >>>> > > >> to
> >>>> > > >> old generation. It seems lots of object has a long life than GC
> >>>> period
> >>>> > > and
> >>>> > > >> the space is not be reuse(as concurrent GC will release it
> >>>> later).
> >>>> > When
> >>>> > > we
> >>>> > > >> use a large Eden(>=1G for example), once GC time will be
> >>>> seconds. If
> >>>> > set
> >>>> > > >> Eden little(256M for example), once GC time will be hundreds
> >>>> > > milliseconds,
> >>>> > > >> but more frequency and total is still seconds. Is there any way
> >>>> to
> >>>> > > lessen
> >>>> > > >> the GC time? (We don’t consider the first query and second
> query
> >>>> in
> >>>> > this
> >>>> > > >> case.)
> >>>> > > >>
> >>>> > > >> *How many node are present in your cluster setup?? If nodes are
> >>>> less
> >>>> > > >> please
> >>>> > > >> reduce the number of executors per node.*
> >>>> > > >>
> >>>> > > >> 2. Performance refine problem. Row
> >>>> number
> >>>> > after
> >>>> > > >> being filtered is not uniform. Some node maybe heavy. It spend
> >>>> more
> >>>> > time
> >>>> > > >> than other node. The time of one task is 4s ~ 16s. Is any
> method
> >>>> to
> >>>> > > refine
> >>>> > > >> it?
> >>>> > > >>
> >>>> > > >> 3. Too long time for first and second
> >>>> query.
> >>>> > I
> >>>> > > >> know dictionary and some index need to be loaded for the first
> >>>> time.
> >>>> > But
> >>>> > > >> after I trying use query below to preheat it, it still spend a
> >>>> lot of
> >>>> > > >> time.
> >>>> > > >> How could I preheat the query correctly?
> >>>> > > >> select Aarray, a, b, c… from Table1
> where
> >>>> > Aarray
> >>>> > > >> is
> >>>> > > >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and h =
> >>>> 55
> >>>> > > >>
> >>>> > > >> *Currently we are working on first time query improvement. For
> >>>> now you
> >>>> > > can
> >>>> > > >> run select count(*) or count(column), so all the blocks get
> >>>> loaded and
> >>>> > > >> then
> >>>> > > >> you can run the actual query.*
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> 4. Any other suggestion to lessen the query time?
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> Some suggestion：
> >>>> > > >> The log by class QueryStatisticsRecorder give me a
> >>>> good
> >>>> > > means
> >>>> > > >> to find the neck bottle, but not enough. There still some
> metric
> >>>> I
> >>>> > think
> >>>> > > >> is
> >>>> > > >> very useful:
> >>>> > > >> 1. filter ratio. i.e.. not only result_size but
> also
> >>>> the
> >>>> > > >> origin
> >>>> > > >> size so we could know how many data is filtered.
> >>>> > > >> 2. IO time. The scan_blocks_time is not enough. If
> >>>> it is
> >>>> > > high,
> >>>> > > >> we know somethings wrong, but not know what cause that problem.
> >>>> The
> >>>> > real
> >>>> > > >> IO
> >>>> > > >> time for data is not be provided. As there may be several file
> >>>> for one
> >>>> > > >> partition, know the program slow is caused by datanode or
> >>>> executor
> >>>> > > itself
> >>>> > > >> give us intuition to find the problem.
> >>>> > > >> 3. The TableBlockInfo for task. I log it by myself
> >>>> when
> >>>> > > >> debugging. It tell me how many blocklets is locality. The spark
> >>>> web
> >>>> > > >> monitor
> >>>> > > >> just give a locality level, but may be only one blocklet is
> >>>> locality.
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> -Regards
> >>>> > > >> Kumar Vishal
> >>>> > > >>
> >>>> > > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]>
> >>>> wrote:
> >>>> > > >>
> >>>> > > >> > Hi,
> >>>> > > >> >
> >>>> > > >> > We are using carbondata to build our table and running query
> in
> >>>> > > >> > CarbonContext. We have some performance problem during
> >>>> refining the
> >>>> > > >> system.
> >>>> > > >> >
> >>>> > > >> > *Background*：
> >>>> > > >> >
> >>>> > > >> > *cluster*： 100 executor，5 task/executor,
> >>>> 10G
> >>>> > > >> > memory/executor
> >>>> > > >> >
> >>>> > > >> > *data*: 60+GB(per one replica)
> as
> >>>> > carbon
> >>>> > > >> data
> >>>> > > >> > format, 600+MB/file * 100 file, 300+columns, 300+million rows
> >>>> > > >> >
> >>>> > > >> > *sql example:*
> >>>> > > >> >
> >>>> > > >> > select A,
> >>>> > > >> >
> >>>> > > >> > sum(a),
> >>>> > > >> >
> >>>> > > >> > sum(b),
> >>>> > > >> >
> >>>> > > >> > sum(c),
> >>>> > > >> >
> >>>> > > >> > …( extra 100
> aggregation
> >>>> like
> >>>> > > >> > sum(column))
> >>>> > > >> >
> >>>> > > >> > from Table1 LATERAL
> VIEW
> >>>> > > >> > explode(split(Aarray, ‘*;*’)) ATable AS A
> >>>> > > >> >
> >>>> > > >> > where A is not null and
> >>>> d >
> >>>> > > >> “ab:c-10”
> >>>> > > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP
> BY
> >>>> A
> >>>> > > >> >
> >>>> > > >> > *target query time*: <10s
> >>>> > > >> >
> >>>> > > >> > *current query time*: 15s ~ 25s
> >>>> > > >> >
> >>>> > > >> > *scene:* OLAP system. <100
> queries
> >>>> every
> >>>> > > >> day.
> >>>> > > >> > Concurrency number is not high. Most time cpu is idle, so
> this
> >>>> > service
> >>>> > > >> will
> >>>> > > >> > run with other program. The service will run for long time.
> We
> >>>> could
> >>>> > > not
> >>>> > > >> > occupy a very large memory for every executor.
> >>>> > > >> >
> >>>> > > >> > *refine*: I have build index and
> >>>> > > >> dictionary on
> >>>> > > >> > d, e, f, g, h and build dictionary on all other aggregation
> >>>> > > >> columns(i.e. a,
> >>>> > > >> > b, c, …100+ columns). And make sure there is one segment for
> >>>> total
> >>>> > > >> data. I
> >>>> > > >> > have open the speculation(quantile=0.5, interval=250,
> >>>> > multiplier=1.2).
> >>>> > > >> >
> >>>> > > >> > Time is mainly spent on first stage before shuffling. As 95%
> >>>> data
> >>>> > will
> >>>> > > >> be
> >>>> > > >> > filtered out, the shuffle process spend little time. In first
> >>>> stage,
> >>>> > > >> most
> >>>> > > >> > task complete in less than 10s. But there still be near 50
> >>>> tasks
> >>>> > > longer
> >>>> > > >> > than 10s. Max task time in one query may be 12~16s.
> >>>> > > >> >
> >>>> > > >> > *Problem:*
> >>>> > > >> >
> >>>> > > >> > 1. GC problem. We suffer a 20%~30% GC time for some
> >>>> task in
> >>>> > > >> first
> >>>> > > >> > stage after a lot of parameter refinement. We now use G1 GC
> in
> >>>> > java8.
> >>>> > > GC
> >>>> > > >> > time will double if use CMS. The main GC time is spent on
> young
> >>>> > > >> generation
> >>>> > > >> > GC. Almost half memory of young generation will be copy to
> old
> >>>> > > >> generation.
> >>>> > > >> > It seems lots of object has a long life than GC period and
> the
> >>>> space
> >>>> > > is
> >>>> > > >> not
> >>>> > > >> > be reuse(as concurrent GC will release it later). When we
> use a
> >>>> > large
> >>>> > > >> > Eden(>=1G for example), once GC time will be seconds. If set
> >>>> Eden
> >>>> > > >> > little(256M for example), once GC time will be hundreds
> >>>> > milliseconds,
> >>>> > > >> but
> >>>> > > >> > more frequency and total is still seconds. Is there any way
> to
> >>>> > lessen
> >>>> > > >> the
> >>>> > > >> > GC time? (We don’t consider the first query and second query
> >>>> in this
> >>>> > > >> case.)
> >>>> > > >> >
> >>>> > > >> > 2. Performance refine problem. Row number after
> being
> >>>> > > >> filtered is
> >>>> > > >> > not uniform. Some node maybe heavy. It spend more time than
> >>>> other
> >>>> > > node.
> >>>> > > >> The
> >>>> > > >> > time of one task is 4s ~ 16s. Is any method to refine it?
> >>>> > > >> >
> >>>> > > >> > 3. Too long time for first and second query. I know
> >>>> > > dictionary
> >>>> > > >> > and some index need to be loaded for the first time. But
> after
> >>>> I
> >>>> > > trying
> >>>> > > >> use
> >>>> > > >> > query below to preheat it, it still spend a lot of time. How
> >>>> could I
> >>>> > > >> > preheat the query correctly?
> >>>> > > >> >
> >>>> > > >> > select Aarray, a, b, c… from Table1 where
> Aarray
> >>>> is
> >>>> > not
> >>>> > > >> null
> >>>> > > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> >>>> > > >> >
> >>>> > > >> > 4. Any other suggestion to lessen the query time?
> >>>> > > >> >
> >>>> > > >> >
> >>>> > > >> >
> >>>> > > >> > Some suggestion：
> >>>> > > >> >
> >>>> > > >> > The log by class QueryStatisticsRecorder give me
> a
> >>>> good
> >>>> > > >> means
> >>>> > > >> > to find the neck bottle, but not enough. There still some
> >>>> metric I
> >>>> > > >> think is
> >>>> > > >> > very useful:
> >>>> > > >> >
> >>>> > > >> > 1. filter ratio. i.e.. not only result_size but
> >>>> also the
> >>>> > > >> origin
> >>>> > > >> > size so we could know how many data is filtered.
> >>>> > > >> >
> >>>> > > >> > 2. IO time. The scan_blocks_time is not enough.
> If
> >>>> it is
> >>>> > > >> high,
> >>>> > > >> > we know somethings wrong, but not know what cause that
> >>>> problem. The
> >>>> > > >> real IO
> >>>> > > >> > time for data is not be provided. As there may be several
> file
> >>>> for
> >>>> > one
> >>>> > > >> > partition, know the program slow is caused by datanode or
> >>>> executor
> >>>> > > >> itself
> >>>> > > >> > give us intuition to find the problem.
> >>>> > > >> >
> >>>> > > >> > 3. The TableBlockInfo for task. I log it by
> myself
> >>>> when
> >>>> > > >> > debugging. It tell me how many blocklets is locality. The
> >>>> spark web
> >>>> > > >> monitor
> >>>> > > >> > just give a locality level, but may be only one blocklet is
> >>>> > locality.
> >>>> > > >> >
> >>>> > > >> >
> >>>> > > >> > ---------------------
> >>>> > > >> >
> >>>> > > >> > Anning Luo
> >>>> > > >> >
> >>>> > > >> > *HULU*
> >>>> > > >> >
> >>>> > > >> > Email: [hidden email]
> >>>> > > >> >
> >>>> > > >> > [hidden email]
> >>>> > > >> >
> >>>> > > >>
> >>>> > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> >>>
> >>>
> >>
> >
>

... [show rest of quote]

kumarvishal09

Nov 16, 2016; 1:49pm

Re: GC problem and performance refine problem

142 posts

Hi An Lan,
For detail logging you need to add log4j configuration in
spark-default.conf for both driver and executor.
* spark.driver.extraJavaOption = -Dlog4j.configuration=file:/<filepath>*
* spark.executor.extraJavaOption = -Dlog4j.configuration=file:/<filepath>*

Please make sure in* log4j.properties* *log4j.rootCategory* is *INFO*

-Regards
Kumar Vishal

On Wed, Nov 16, 2016 at 2:06 PM, An Lan <[hidden email]> wrote:

> Hi Kumar Vishal,
>
> Thanks for your suggestion.
> The driver log not contain block distribution log by default. How could I
> open it?
> And how does the order of the dimensions be decided?
>
> 2016-11-16 15:14 GMT+08:00 Kumar Vishal <[hidden email]>:
>
> > Hi An Lan,
> > Data is already distributed, in this case may be one
> > blocklet is returning more number of rows and other returning less
> because
> > of this some task will take more time.
> >
> > In driver log block distribution log is not present, so it is not clear
> > whether it is going for block distribution or blocklet distribution.
> > distribution, can you please add the detail driver log.
> >
> > *Some suggestion:*
> >
> > 1. If column is String and you are not applying filter then in create
> > statement add it in dictionary exclude this will avoid lookup in
> > dictionary. If number of String column are less then better add it in
> > dictionary exclude.
> >
> > 2. If column is a numeric column and you are not applying filter then no
> > need to add in dictionary include or exclude, so it will be a measure
> > column.
> >
> > 3. Currently in carbon for measure column if you will create as a double
> > data type it will give more compression as currently value compression
> mode
> > is support for double data type.
> >
> > -Regards
> > Kumar Vishal
> >
> >
> > On Wed, Nov 16, 2016 at 8:12 AM, An Lan <[hidden email]> wrote:
> >
> > > Hi Kumar Vishal,
> > >
> > > 1. I found the quantity of rows filtered out by invert index is not
> > > uniform between different tasks and the difference is large. Some task
> > may
> > > be 3~4k row after filtered, but the longer tasks may be 3~4w. When most
> > > longer task on same node, time cost will be more longer than others.
> So,
> > is
> > > there any way to balance the data and make rows distribute uniform
> > between
> > > task?
> > > Now, the blocklet size is still 120k rows. 3k+ blocket in total. Every
> > > task has 6 blocket.
> > > 2. I upload some logs last time. Is there some access problem about it?
> > Is
> > > there any suggest or question about it?
> > >
> > > 2016-11-14 11:29 GMT+08:00 An Lan <[hidden email]>:
> > >
> > >> Hi Kumar Vishal,
> > >>
> > >>
> > >> Driver and some executor logs are in the accessory. The same query run
> > >> for five times.
> > >>
> > >>
> > >>
> > >> Time consume for every query:
> > >>
> > >> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
> > >>
> > >>
> > >> The first stage for every query:
> > >>
> > >> (first query -> start from 0 stage
> > >>
> > >> second query -> 4
> > >>
> > >> third query -> 8
> > >>
> > >> fourth query -> 12
> > >>
> > >> five query -> 16)
> > >>
> > >> see "first stage of queries.png" in accessory
> > >>
> > >>
> > >>
> > >> Longest task for every query:
> > >>
> > >> (Task and log relationship:
> > >>
> > >> stage 0, task 249 -> 1.log
> > >>
> > >> stage 4, task 420 -> 2.log
> > >>
> > >> stage 8, task 195 -> 3.log
> > >>
> > >> stage 12, task 350 -> 4.log
> > >>
> > >> stage 16, task 321 -> 5.log)
> > >> see "longest task.png" in accessory
> > >>
> > >>
> > >> longest tasks.png
> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7dGY4cF9jRDk1QTg/
> > view?usp=drive_web>
> > >>
> > >> first stage of queries.png
> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7SVZsNnR2VEw2X0k/
> > view?usp=drive_web>
> > >>
> > >> dirverlog
> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7VllnVkNnenhyTTA/
> > view?usp=drive_web>
> > >>
> > >> 1.log
> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7eUdUWmlia1J4bk0/
> > view?usp=drive_web>
> > >>
> > >> 2.log
> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ZUlGZHZVQ3phdmM/
> > view?usp=drive_web>
> > >>
> > >> 3.log
> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7UHpBQzREX3N5aEk/
> > view?usp=drive_web>
> > >>
> > >> 4.log
> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7NVctMmYwNldCVEk/
> > view?usp=drive_web>
> > >>
> > >> 5.log
> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ODhHNE5sSGNfOVE/
> > view?usp=drive_web>
> > >>
> > >>
> > >> 2016-11-14 11:22 GMT+08:00 An Lan <[hidden email]>:
> > >>
> > >>> Hi,
> > >>>
> > >>>
> > >>>
> > >>> Driver and some executor logs are in the accessory. The same query
> run
> > >>> for five times.
> > >>>
> > >>>
> > >>>
> > >>> Time consume for every query:
> > >>>
> > >>> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
> > >>>
> > >>>
> > >>>
> > >>> The first stage for every query:
> > >>>
> > >>> (first query -> start from 0 stage
> > >>>
> > >>> second query -> 4
> > >>>
> > >>> third query -> 8
> > >>>
> > >>> fourth query -> 12
> > >>>
> > >>> five query -> 16)
> > >>>
> > >>>
> > >>> Longest task for every query:
> > >>>
> > >>> (Task and log relationship:
> > >>>
> > >>> stage 0, task 249 -> 1.log
> > >>>
> > >>> stage 4, task 420 -> 2.log
> > >>>
> > >>> stage 8, task 195 -> 3.log
> > >>>
> > >>> stage 12, task 350 -> 4.log
> > >>>
> > >>> stage 16, task 321 -> 5.log)
> > >>>
> > >>>
> > >>>
> > >>> 2016-11-11 20:25 GMT+08:00 Kumar Vishal <[hidden email]>:
> > >>>
> > >>>> What is the time difference between first time and second time query
> > as
> > >>>> second time it will read from os cache,so i think there wont be any
> IO
> > >>>> bottleneck.
> > >>>>
> > >>>> Can u provide driver log and executor log for task which are taking
> > more
> > >>>> time.
> > >>>>
> > >>>>
> > >>>> -Regards
> > >>>> Kumar Vishal
> > >>>>
> > >>>> On Fri, Nov 11, 2016 at 3:21 PM, An Lan <[hidden email]> wrote:
> > >>>>
> > >>>> > 1. I have set --num-executors=100 in all test. The image write 100
> > >>>> nodes
> > >>>> > means one executor for one node, and when there is 64 node, there
> > >>>> maybe two
> > >>>> > or three executor on one node. The total executor number is always
> > >>>> 100.
> > >>>> >
> > >>>> > 2. I do not find the property enable.blocklet.distribution in
> > 0.1.1. I
> > >>>> > found it in 0.2.0. I am testing on 0.1.1, and will try it later.
> If
> > >>>> you
> > >>>> > concern locality level, it seems that the most long query is not
> > >>>> caused by
> > >>>> > IO problem. Statistic of the long task statistic like this:
> > >>>> >
> > >>>> > +--------------------+----------------+--------------------+
> > >>>> > ----------------+---------------+-----------+-------------------+
> > >>>> > | task_id|load_blocks_time|load
> > >>>> _dictionary_time|scan_blocks_
> > >>>> > time|scan_blocks_num|result_size|total_executor_time|
> > >>>> > +--------------------+----------------+--------------------+
> > >>>> > ----------------+---------------+-----------+-------------------+
> > >>>> > |4199754456147478_134| 1 | 32 |
> > >>>> > 3306 | 3 | 42104 | 13029 |
> > >>>> > +--------------------+----------------+--------------------+
> > >>>> > ----------------+---------------+-----------+-------------------+
> > >>>> >
> > >>>> > I have suffer from IO, but after configure speculation, most tasks
> > >>>> > with a long IO will be resend.
> > >>>> >
> > >>>> >
> > >>>> > 3. distinct value:
> > >>>> >
> > >>>> > g: 11
> > >>>> >
> > >>>> > h: 3
> > >>>> >
> > >>>> > f: 7
> > >>>> >
> > >>>> > e: 3
> > >>>> >
> > >>>> > d: 281
> > >>>> >
> > >>>> > 4. In the query, only e, f ,g, h, d and A are used in filter,
> others
> > >>>> are
> > >>>> > not used. So I think others used in the aggregation are no need
> > added
> > >>>> for
> > >>>> > index.
> > >>>> >
> > >>>> > 5. I have write the e, f, g, h, d in most left just after "create
> > >>>> table..."
> > >>>> >
> > >>>> > 2016-11-11 15:08 GMT+08:00 Kumar Vishal <
> [hidden email]
> > >:
> > >>>> >
> > >>>> > > Hi An Lan,
> > >>>> > >
> > >>>> > > Please confirm below things.
> > >>>> > >
> > >>>> > > 1. Is dynamic executor is enabled?? If it is enabled can u
> > disabled
> > >>>> and
> > >>>> > > try(this is to check is there any impact with dynamic executor)
> > >>>> > > for disabling dynamic executor you need set in
> > >>>> spark-default.conf
> > >>>> > > --num-executors=100
> > >>>> > >
> > >>>> > > 2. Can u please set in below property
> > enable.blocklet.distribution=f
> > >>>> alse
> > >>>> > > and execute the query.
> > >>>> > >
> > >>>> > > 3. Cardinality of each column.
> > >>>> > >
> > >>>> > > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??
> > >>>> > >
> > >>>> > > 5. Can u keep all the columns which is present in filter on left
> > >>>> side so
> > >>>> > > less no of blocks will identified during block pruning and it
> will
> > >>>> > improve
> > >>>> > > the query performance.
> > >>>> > >
> > >>>> > >
> > >>>> > > -Regards
> > >>>> > > Kumar Vishal
> > >>>> > >
> > >>>> > > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <[hidden email]>
> > >>>> wrote:
> > >>>> > >
> > >>>> > > > Hi Kumar Vishal,
> > >>>> > > >
> > >>>> > > > 1. Create table ddl:
> > >>>> > > >
> > >>>> > > > CREATE TABLE IF NOT EXISTS Table1
> > >>>> > > >
> > >>>> > > > (* h Int, g Int, d String, f Int, e Int,*
> > >>>> > > >
> > >>>> > > > a Int, b Int, …(extra near 300 columns)
> > >>>> > > >
> > >>>> > > > STORED BY 'org.apache.carbondata.format'
> > >>>> > > > TBLPROPERTIES(
> > >>>> > > >
> > >>>> > > > "NO_INVERTED_INDEX”=“a”,
> > >>>> > > > "NO_INVERTED_INDEX”=“b”,
> > >>>> > > >
> > >>>> > > > …(extra near 300 columns)
> > >>>> > > >
> > >>>> > > > "DICTIONARY_INCLUDE”=“a”,
> > >>>> > > >
> > >>>> > > > "DICTIONARY_INCLUDE”=“b”,
> > >>>> > > >
> > >>>> > > > …(extra near 300 columns)
> > >>>> > > >
> > >>>> > > > )
> > >>>> > > >
> > >>>> > > > 2. 3. There more than hundreds node in the cluster, but
> > >>>> cluster
> > >>>> > is
> > >>>> > > > used mixed with other application. Some time when node is
> > >>>> enough, we
> > >>>> > > will
> > >>>> > > > get 100 distinct node.
> > >>>> > > >
> > >>>> > > > 4. I give a statistic of task time during once query and
> > mark
> > >>>> > > > distinct nodes below:
> > >>>> > > >
> > >>>> > > > [image: 内嵌图片 1]
> > >>>> > > >
> > >>>> > > >
> > >>>> > > >
> > >>>> > > >
> > >>>> > > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <
> > >>>> [hidden email]>:
> > >>>> > > >
> > >>>> > > >> Hi Anning Luo,
> > >>>> > > >>
> > >>>> > > >> Can u please provide below details.
> > >>>> > > >>
> > >>>> > > >> 1.Create table ddl.
> > >>>> > > >> 2.Number of node in you cluster setup.
> > >>>> > > >> 3. Number of executors per node.
> > >>>> > > >> 4. Query statistics.
> > >>>> > > >>
> > >>>> > > >> Please find my comments in bold.
> > >>>> > > >>
> > >>>> > > >> Problem:
> > >>>> > > >> 1. GC problem. We suffer a 20%~30%
> GC
> > >>>> time
> > >>>> > for
> > >>>> > > >> some task in first stage after a lot of parameter refinement.
> > We
> > >>>> now
> > >>>> > use
> > >>>> > > >> G1
> > >>>> > > >> GC in java8. GC time will double if use CMS. The main GC time
> > is
> > >>>> spent
> > >>>> > > on
> > >>>> > > >> young generation GC. Almost half memory of young generation
> > will
> > >>>> be
> > >>>> > copy
> > >>>> > > >> to
> > >>>> > > >> old generation. It seems lots of object has a long life than
> GC
> > >>>> period
> > >>>> > > and
> > >>>> > > >> the space is not be reuse(as concurrent GC will release it
> > >>>> later).
> > >>>> > When
> > >>>> > > we
> > >>>> > > >> use a large Eden(>=1G for example), once GC time will be
> > >>>> seconds. If
> > >>>> > set
> > >>>> > > >> Eden little(256M for example), once GC time will be hundreds
> > >>>> > > milliseconds,
> > >>>> > > >> but more frequency and total is still seconds. Is there any
> way
> > >>>> to
> > >>>> > > lessen
> > >>>> > > >> the GC time? (We don’t consider the first query and second
> > query
> > >>>> in
> > >>>> > this
> > >>>> > > >> case.)
> > >>>> > > >>
> > >>>> > > >> *How many node are present in your cluster setup?? If nodes
> are
> > >>>> less
> > >>>> > > >> please
> > >>>> > > >> reduce the number of executors per node.*
> > >>>> > > >>
> > >>>> > > >> 2. Performance refine problem. Row
> > >>>> number
> > >>>> > after
> > >>>> > > >> being filtered is not uniform. Some node maybe heavy. It
> spend
> > >>>> more
> > >>>> > time
> > >>>> > > >> than other node. The time of one task is 4s ~ 16s. Is any
> > method
> > >>>> to
> > >>>> > > refine
> > >>>> > > >> it?
> > >>>> > > >>
> > >>>> > > >> 3. Too long time for first and
> second
> > >>>> query.
> > >>>> > I
> > >>>> > > >> know dictionary and some index need to be loaded for the
> first
> > >>>> time.
> > >>>> > But
> > >>>> > > >> after I trying use query below to preheat it, it still spend
> a
> > >>>> lot of
> > >>>> > > >> time.
> > >>>> > > >> How could I preheat the query correctly?
> > >>>> > > >> select Aarray, a, b, c… from Table1
> > where
> > >>>> > Aarray
> > >>>> > > >> is
> > >>>> > > >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and
> h =
> > >>>> 55
> > >>>> > > >>
> > >>>> > > >> *Currently we are working on first time query improvement.
> For
> > >>>> now you
> > >>>> > > can
> > >>>> > > >> run select count(*) or count(column), so all the blocks get
> > >>>> loaded and
> > >>>> > > >> then
> > >>>> > > >> you can run the actual query.*
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> 4. Any other suggestion to lessen the query time?
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> Some suggestion：
> > >>>> > > >> The log by class QueryStatisticsRecorder give me
> a
> > >>>> good
> > >>>> > > means
> > >>>> > > >> to find the neck bottle, but not enough. There still some
> > metric
> > >>>> I
> > >>>> > think
> > >>>> > > >> is
> > >>>> > > >> very useful:
> > >>>> > > >> 1. filter ratio. i.e.. not only result_size but
> > also
> > >>>> the
> > >>>> > > >> origin
> > >>>> > > >> size so we could know how many data is filtered.
> > >>>> > > >> 2. IO time. The scan_blocks_time is not enough.
> If
> > >>>> it is
> > >>>> > > high,
> > >>>> > > >> we know somethings wrong, but not know what cause that
> problem.
> > >>>> The
> > >>>> > real
> > >>>> > > >> IO
> > >>>> > > >> time for data is not be provided. As there may be several
> file
> > >>>> for one
> > >>>> > > >> partition, know the program slow is caused by datanode or
> > >>>> executor
> > >>>> > > itself
> > >>>> > > >> give us intuition to find the problem.
> > >>>> > > >> 3. The TableBlockInfo for task. I log it by
> myself
> > >>>> when
> > >>>> > > >> debugging. It tell me how many blocklets is locality. The
> spark
> > >>>> web
> > >>>> > > >> monitor
> > >>>> > > >> just give a locality level, but may be only one blocklet is
> > >>>> locality.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> -Regards
> > >>>> > > >> Kumar Vishal
> > >>>> > > >>
> > >>>> > > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]>
> > >>>> wrote:
> > >>>> > > >>
> > >>>> > > >> > Hi,
> > >>>> > > >> >
> > >>>> > > >> > We are using carbondata to build our table and running
> query
> > in
> > >>>> > > >> > CarbonContext. We have some performance problem during
> > >>>> refining the
> > >>>> > > >> system.
> > >>>> > > >> >
> > >>>> > > >> > *Background*：
> > >>>> > > >> >
> > >>>> > > >> > *cluster*： 100 executor，5
> task/executor,
> > >>>> 10G
> > >>>> > > >> > memory/executor
> > >>>> > > >> >
> > >>>> > > >> > *data*: 60+GB(per one replica)
> > as
> > >>>> > carbon
> > >>>> > > >> data
> > >>>> > > >> > format, 600+MB/file * 100 file, 300+columns, 300+million
> rows
> > >>>> > > >> >
> > >>>> > > >> > *sql example:*
> > >>>> > > >> >
> > >>>> > > >> > select A,
> > >>>> > > >> >
> > >>>> > > >> > sum(a),
> > >>>> > > >> >
> > >>>> > > >> > sum(b),
> > >>>> > > >> >
> > >>>> > > >> > sum(c),
> > >>>> > > >> >
> > >>>> > > >> > …( extra 100
> > aggregation
> > >>>> like
> > >>>> > > >> > sum(column))
> > >>>> > > >> >
> > >>>> > > >> > from Table1 LATERAL
> > VIEW
> > >>>> > > >> > explode(split(Aarray, ‘*;*’)) ATable AS A
> > >>>> > > >> >
> > >>>> > > >> > where A is not null
> and
> > >>>> d >
> > >>>> > > >> “ab:c-10”
> > >>>> > > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP
> > BY
> > >>>> A
> > >>>> > > >> >
> > >>>> > > >> > *target query time*: <10s
> > >>>> > > >> >
> > >>>> > > >> > *current query time*: 15s ~ 25s
> > >>>> > > >> >
> > >>>> > > >> > *scene:* OLAP system. <100
> > queries
> > >>>> every
> > >>>> > > >> day.
> > >>>> > > >> > Concurrency number is not high. Most time cpu is idle, so
> > this
> > >>>> > service
> > >>>> > > >> will
> > >>>> > > >> > run with other program. The service will run for long time.
> > We
> > >>>> could
> > >>>> > > not
> > >>>> > > >> > occupy a very large memory for every executor.
> > >>>> > > >> >
> > >>>> > > >> > *refine*: I have build index
> and
> > >>>> > > >> dictionary on
> > >>>> > > >> > d, e, f, g, h and build dictionary on all other aggregation
> > >>>> > > >> columns(i.e. a,
> > >>>> > > >> > b, c, …100+ columns). And make sure there is one segment
> for
> > >>>> total
> > >>>> > > >> data. I
> > >>>> > > >> > have open the speculation(quantile=0.5, interval=250,
> > >>>> > multiplier=1.2).
> > >>>> > > >> >
> > >>>> > > >> > Time is mainly spent on first stage before shuffling. As
> 95%
> > >>>> data
> > >>>> > will
> > >>>> > > >> be
> > >>>> > > >> > filtered out, the shuffle process spend little time. In
> first
> > >>>> stage,
> > >>>> > > >> most
> > >>>> > > >> > task complete in less than 10s. But there still be near 50
> > >>>> tasks
> > >>>> > > longer
> > >>>> > > >> > than 10s. Max task time in one query may be 12~16s.
> > >>>> > > >> >
> > >>>> > > >> > *Problem:*
> > >>>> > > >> >
> > >>>> > > >> > 1. GC problem. We suffer a 20%~30% GC time for
> some
> > >>>> task in
> > >>>> > > >> first
> > >>>> > > >> > stage after a lot of parameter refinement. We now use G1 GC
> > in
> > >>>> > java8.
> > >>>> > > GC
> > >>>> > > >> > time will double if use CMS. The main GC time is spent on
> > young
> > >>>> > > >> generation
> > >>>> > > >> > GC. Almost half memory of young generation will be copy to
> > old
> > >>>> > > >> generation.
> > >>>> > > >> > It seems lots of object has a long life than GC period and
> > the
> > >>>> space
> > >>>> > > is
> > >>>> > > >> not
> > >>>> > > >> > be reuse(as concurrent GC will release it later). When we
> > use a
> > >>>> > large
> > >>>> > > >> > Eden(>=1G for example), once GC time will be seconds. If
> set
> > >>>> Eden
> > >>>> > > >> > little(256M for example), once GC time will be hundreds
> > >>>> > milliseconds,
> > >>>> > > >> but
> > >>>> > > >> > more frequency and total is still seconds. Is there any way
> > to
> > >>>> > lessen
> > >>>> > > >> the
> > >>>> > > >> > GC time? (We don’t consider the first query and second
> query
> > >>>> in this
> > >>>> > > >> case.)
> > >>>> > > >> >
> > >>>> > > >> > 2. Performance refine problem. Row number after
> > being
> > >>>> > > >> filtered is
> > >>>> > > >> > not uniform. Some node maybe heavy. It spend more time than
> > >>>> other
> > >>>> > > node.
> > >>>> > > >> The
> > >>>> > > >> > time of one task is 4s ~ 16s. Is any method to refine it?
> > >>>> > > >> >
> > >>>> > > >> > 3. Too long time for first and second query. I
> know
> > >>>> > > dictionary
> > >>>> > > >> > and some index need to be loaded for the first time. But
> > after
> > >>>> I
> > >>>> > > trying
> > >>>> > > >> use
> > >>>> > > >> > query below to preheat it, it still spend a lot of time.
> How
> > >>>> could I
> > >>>> > > >> > preheat the query correctly?
> > >>>> > > >> >
> > >>>> > > >> > select Aarray, a, b, c… from Table1 where
> > Aarray
> > >>>> is
> > >>>> > not
> > >>>> > > >> null
> > >>>> > > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
> > >>>> > > >> >
> > >>>> > > >> > 4. Any other suggestion to lessen the query time?
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Some suggestion：
> > >>>> > > >> >
> > >>>> > > >> > The log by class QueryStatisticsRecorder give
> me
> > a
> > >>>> good
> > >>>> > > >> means
> > >>>> > > >> > to find the neck bottle, but not enough. There still some
> > >>>> metric I
> > >>>> > > >> think is
> > >>>> > > >> > very useful:
> > >>>> > > >> >
> > >>>> > > >> > 1. filter ratio. i.e.. not only result_size but
> > >>>> also the
> > >>>> > > >> origin
> > >>>> > > >> > size so we could know how many data is filtered.
> > >>>> > > >> >
> > >>>> > > >> > 2. IO time. The scan_blocks_time is not enough.
> > If
> > >>>> it is
> > >>>> > > >> high,
> > >>>> > > >> > we know somethings wrong, but not know what cause that
> > >>>> problem. The
> > >>>> > > >> real IO
> > >>>> > > >> > time for data is not be provided. As there may be several
> > file
> > >>>> for
> > >>>> > one
> > >>>> > > >> > partition, know the program slow is caused by datanode or
> > >>>> executor
> > >>>> > > >> itself
> > >>>> > > >> > give us intuition to find the problem.
> > >>>> > > >> >
> > >>>> > > >> > 3. The TableBlockInfo for task. I log it by
> > myself
> > >>>> when
> > >>>> > > >> > debugging. It tell me how many blocklets is locality. The
> > >>>> spark web
> > >>>> > > >> monitor
> > >>>> > > >> > just give a locality level, but may be only one blocklet is
> > >>>> > locality.
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > ---------------------
> > >>>> > > >> >
> > >>>> > > >> > Anning Luo
> > >>>> > > >> >
> > >>>> > > >> > *HULU*
> > >>>> > > >> >
> > >>>> > > >> > Email: [hidden email]
> > >>>> > > >> >
> > >>>> > > >> > [hidden email]
> > >>>> > > >> >
> > >>>> > > >>
> > >>>> > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
>

... [show rest of quote]

kumar vishal

kumarvishal09

Nov 16, 2016; 1:55pm

Re: GC problem and performance refine problem

142 posts

Hi An Lan,
Please go through discussion *how to create carbon table for better query
performance* shared by Bill. I am coping the same in this mail for you
reference.

Discussion how to crate the CarbonData table with good performance
Suggestion to create Carbon table
Recently we used CarbonData to do the performance in Telecommunication filed
and summarize some of the Suggestions while creating the CarbonData table.We
have tables which range from 10 thousand rows to 10 billion rows and have
from 100 columns to 300 columns. Following are some of the columns used in
the table.
Column name Data type Cardinality Attribution
msisdn String 30 million dimension
BEGIN_TIME bigint 10 thousand dimension
HOST String 1 million dimension
Dime_1 String 1 thousand dimension
counter_1 numeric(20,0) NA measure
... ... NA ...
counter_100 numeric(20,0) NA measure
We have about more than 50 test cases; according to the test case we
summarize some suggestion to create the table which can have a better query
performance. 1. Put the frequently-used column filter in the beginning. For
example, MSISDN filter is used in most of the query then put the MSISDN in
the first column. The create table command can be as follows, the query
which has MSISDN as a filter will be good (because the MSISDN is high
cardinality, if create table like this the compress ratio will be
decreased)/create table carbondata_table(//msisdn String,//...//)STORED BY
'org.apache.carbondata.format' //TBLPROPERTIES (
'DICTIONARY_EXCLUDE'='MSISDN,..','DICTIONARY_INCLUDE'='...');/2. If
has
multiple column which is frequently-use in the filter, put it to the front
in the order as low cardinality to high cardinality.For example if msisdn,
host and dime_1 is frequently-used column, the table column order can be
like dime_1->host->msisdn, because the dime_1 cardinality is low. Create
table command can be as follows. This will increase the compression ratio
and good performance for filter on dime_1, host and msisdn. /create table
carbondata_table(Dime_1 String,HOST String,MSISDN String,...)STORED BY
'org.apache.carbondata.format' /TBLPROPERTIES (
'DICTIONARY_EXCLUDE'='MSISDN,HOST..','DICTIONARY_INCLUDE'='Dime_1..');3.
If
no column is frequent-use in filter, then can put all the dimension column
order as from low cardinality to high cardinality. Create table command can
be as following: /create table carbondata_table(Dime_1 String,BEGIN_TIME
bigintHOST String,MSISDN String,...)STORED BY 'org.apache.carbondata.format'
TBLPROPERTIES (
'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI..','DICTIONARY_
INCLUDE'='Dime_1,END_TIME,BEGIN_TIME..');/4.
For measure that needs no high accuracy, then no need to use numeric(20,0)
data type, suggestion is to use double to replace it than will increase the
query performance. If one test case uses double to replace the numeric (20,
0) the query improve 5 times from 15 second to 3 second. Create table
command can be as follows. /create table carbondata_table(Dime_1
String,BEGIN_TIME bigintHOST String,MSISDN String,counter_1 double,counter_2
double,...counter_100 double,)STORED BY 'org.apache.carbondata.format'
TBLPROPERTIES (
'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI','DICTIONARY_
INCLUDE'='Dime_1,END_TIME,BEGIN_TIME');/5.
If the column which is always incremental like start_time. For example one
scenario: every day we will load data into carbon and the start_time is
incremental for each load. For this scenario you can put the start_time
column in the back of dimension, because always incremental value can use
the min/max index well always. Create table command can be as following.
/create table carbondata_table(Dime_1 String,HOST String,MSISDN
String,counter_1 double,counter_2 double,BEGIN_TIME bigint,...counter_100
double,)STORED BY 'org.apache.carbondata.format' TBLPROPERTIES (
'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI','DICTIONARY_
INCLUDE'='Dime_1,END_TIME,BEGIN_TIME');/One
more is for the dimension whether dictionary is needed or not, we suggest if
the cardinality higher than 50 thousand do not put it as dictionary column.
If high cardinality column put as dictionary will impact the load
performance.

-Regards
Kumar Vishal

On Wed, Nov 16, 2016 at 7:19 PM, Kumar Vishal <[hidden email]>
wrote:

> Hi An Lan,
> For detail logging you need to add log4j configuration in
> spark-default.conf for both driver and executor.
> * spark.driver.extraJavaOption = -Dlog4j.configuration=file:/<filepath>*
> * spark.executor.extraJavaOption = -Dlog4j.configuration=file:/<filepath>*
>
> Please make sure in* log4j.properties* *log4j.rootCategory* is *INFO*
>
> -Regards
> Kumar Vishal
>
>
>
>
>
>
>
> On Wed, Nov 16, 2016 at 2:06 PM, An Lan <[hidden email]> wrote:
>
>> Hi Kumar Vishal,
>>
>> Thanks for your suggestion.
>> The driver log not contain block distribution log by default. How could I
>> open it?
>> And how does the order of the dimensions be decided?
>>
>> 2016-11-16 15:14 GMT+08:00 Kumar Vishal <[hidden email]>:
>>
>> > Hi An Lan,
>> > Data is already distributed, in this case may be one
>> > blocklet is returning more number of rows and other returning less
>> because
>> > of this some task will take more time.
>> >
>> > In driver log block distribution log is not present, so it is not clear
>> > whether it is going for block distribution or blocklet distribution.
>> > distribution, can you please add the detail driver log.
>> >
>> > *Some suggestion:*
>> >
>> > 1. If column is String and you are not applying filter then in create
>> > statement add it in dictionary exclude this will avoid lookup in
>> > dictionary. If number of String column are less then better add it in
>> > dictionary exclude.
>> >
>> > 2. If column is a numeric column and you are not applying filter then no
>> > need to add in dictionary include or exclude, so it will be a measure
>> > column.
>> >
>> > 3. Currently in carbon for measure column if you will create as a double
>> > data type it will give more compression as currently value compression
>> mode
>> > is support for double data type.
>> >
>> > -Regards
>> > Kumar Vishal
>> >
>> >
>> > On Wed, Nov 16, 2016 at 8:12 AM, An Lan <[hidden email]> wrote:
>> >
>> > > Hi Kumar Vishal,
>> > >
>> > > 1. I found the quantity of rows filtered out by invert index is not
>> > > uniform between different tasks and the difference is large. Some task
>> > may
>> > > be 3~4k row after filtered, but the longer tasks may be 3~4w. When
>> most
>> > > longer task on same node, time cost will be more longer than others.
>> So,
>> > is
>> > > there any way to balance the data and make rows distribute uniform
>> > between
>> > > task?
>> > > Now, the blocklet size is still 120k rows. 3k+ blocket in total. Every
>> > > task has 6 blocket.
>> > > 2. I upload some logs last time. Is there some access problem about
>> it?
>> > Is
>> > > there any suggest or question about it?
>> > >
>> > > 2016-11-14 11:29 GMT+08:00 An Lan <[hidden email]>:
>> > >
>> > >> Hi Kumar Vishal,
>> > >>
>> > >>
>> > >> Driver and some executor logs are in the accessory. The same query
>> run
>> > >> for five times.
>> > >>
>> > >>
>> > >>
>> > >> Time consume for every query:
>> > >>
>> > >> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
>> > >>
>> > >>
>> > >> The first stage for every query:
>> > >>
>> > >> (first query -> start from 0 stage
>> > >>
>> > >> second query -> 4
>> > >>
>> > >> third query -> 8
>> > >>
>> > >> fourth query -> 12
>> > >>
>> > >> five query -> 16)
>> > >>
>> > >> see "first stage of queries.png" in accessory
>> > >>
>> > >>
>> > >>
>> > >> Longest task for every query:
>> > >>
>> > >> (Task and log relationship:
>> > >>
>> > >> stage 0, task 249 -> 1.log
>> > >>
>> > >> stage 4, task 420 -> 2.log
>> > >>
>> > >> stage 8, task 195 -> 3.log
>> > >>
>> > >> stage 12, task 350 -> 4.log
>> > >>
>> > >> stage 16, task 321 -> 5.log)
>> > >> see "longest task.png" in accessory
>> > >>
>> > >>
>> > >> longest tasks.png
>> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7dGY4cF9jRDk1QTg/
>> > view?usp=drive_web>
>> > >>
>> > >> first stage of queries.png
>> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7SVZsNnR2VEw2X0k/
>> > view?usp=drive_web>
>> > >>
>> > >> dirverlog
>> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7VllnVkNnenhyTTA/
>> > view?usp=drive_web>
>> > >>
>> > >> 1.log
>> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7eUdUWmlia1J4bk0/
>> > view?usp=drive_web>
>> > >>
>> > >> 2.log
>> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ZUlGZHZVQ3phdmM/
>> > view?usp=drive_web>
>> > >>
>> > >> 3.log
>> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7UHpBQzREX3N5aEk/
>> > view?usp=drive_web>
>> > >>
>> > >> 4.log
>> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7NVctMmYwNldCVEk/
>> > view?usp=drive_web>
>> > >>
>> > >> 5.log
>> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ODhHNE5sSGNfOVE/
>> > view?usp=drive_web>
>> > >>
>> > >>
>> > >> 2016-11-14 11:22 GMT+08:00 An Lan <[hidden email]>:
>> > >>
>> > >>> Hi,
>> > >>>
>> > >>>
>> > >>>
>> > >>> Driver and some executor logs are in the accessory. The same query
>> run
>> > >>> for five times.
>> > >>>
>> > >>>
>> > >>>
>> > >>> Time consume for every query:
>> > >>>
>> > >>> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
>> > >>>
>> > >>>
>> > >>>
>> > >>> The first stage for every query:
>> > >>>
>> > >>> (first query -> start from 0 stage
>> > >>>
>> > >>> second query -> 4
>> > >>>
>> > >>> third query -> 8
>> > >>>
>> > >>> fourth query -> 12
>> > >>>
>> > >>> five query -> 16)
>> > >>>
>> > >>>
>> > >>> Longest task for every query:
>> > >>>
>> > >>> (Task and log relationship:
>> > >>>
>> > >>> stage 0, task 249 -> 1.log
>> > >>>
>> > >>> stage 4, task 420 -> 2.log
>> > >>>
>> > >>> stage 8, task 195 -> 3.log
>> > >>>
>> > >>> stage 12, task 350 -> 4.log
>> > >>>
>> > >>> stage 16, task 321 -> 5.log)
>> > >>>
>> > >>>
>> > >>>
>> > >>> 2016-11-11 20:25 GMT+08:00 Kumar Vishal <[hidden email]
>> >:
>> > >>>
>> > >>>> What is the time difference between first time and second time
>> query
>> > as
>> > >>>> second time it will read from os cache,so i think there wont be
>> any IO
>> > >>>> bottleneck.
>> > >>>>
>> > >>>> Can u provide driver log and executor log for task which are taking
>> > more
>> > >>>> time.
>> > >>>>
>> > >>>>
>> > >>>> -Regards
>> > >>>> Kumar Vishal
>> > >>>>
>> > >>>> On Fri, Nov 11, 2016 at 3:21 PM, An Lan <[hidden email]>
>> wrote:
>> > >>>>
>> > >>>> > 1. I have set --num-executors=100 in all test. The image write
>> 100
>> > >>>> nodes
>> > >>>> > means one executor for one node, and when there is 64 node, there
>> > >>>> maybe two
>> > >>>> > or three executor on one node. The total executor number is
>> always
>> > >>>> 100.
>> > >>>> >
>> > >>>> > 2. I do not find the property enable.blocklet.distribution in
>> > 0.1.1. I
>> > >>>> > found it in 0.2.0. I am testing on 0.1.1, and will try it later.
>> If
>> > >>>> you
>> > >>>> > concern locality level, it seems that the most long query is not
>> > >>>> caused by
>> > >>>> > IO problem. Statistic of the long task statistic like this:
>> > >>>> >
>> > >>>> > +--------------------+----------------+--------------------+
>> > >>>> > ----------------+---------------+-----------+---------------
>> ----+
>> > >>>> > | task_id|load_blocks_time|load
>> > >>>> _dictionary_time|scan_blocks_
>> > >>>> > time|scan_blocks_num|result_size|total_executor_time|
>> > >>>> > +--------------------+----------------+--------------------+
>> > >>>> > ----------------+---------------+-----------+---------------
>> ----+
>> > >>>> > |4199754456147478_134| 1 | 32 |
>> > >>>> > 3306 | 3 | 42104 | 13029 |
>> > >>>> > +--------------------+----------------+--------------------+
>> > >>>> > ----------------+---------------+-----------+---------------
>> ----+
>> > >>>> >
>> > >>>> > I have suffer from IO, but after configure speculation, most
>> tasks
>> > >>>> > with a long IO will be resend.
>> > >>>> >
>> > >>>> >
>> > >>>> > 3. distinct value:
>> > >>>> >
>> > >>>> > g: 11
>> > >>>> >
>> > >>>> > h: 3
>> > >>>> >
>> > >>>> > f: 7
>> > >>>> >
>> > >>>> > e: 3
>> > >>>> >
>> > >>>> > d: 281
>> > >>>> >
>> > >>>> > 4. In the query, only e, f ,g, h, d and A are used in filter,
>> others
>> > >>>> are
>> > >>>> > not used. So I think others used in the aggregation are no need
>> > added
>> > >>>> for
>> > >>>> > index.
>> > >>>> >
>> > >>>> > 5. I have write the e, f, g, h, d in most left just after "create
>> > >>>> table..."
>> > >>>> >
>> > >>>> > 2016-11-11 15:08 GMT+08:00 Kumar Vishal <
>> [hidden email]
>> > >:
>> > >>>> >
>> > >>>> > > Hi An Lan,
>> > >>>> > >
>> > >>>> > > Please confirm below things.
>> > >>>> > >
>> > >>>> > > 1. Is dynamic executor is enabled?? If it is enabled can u
>> > disabled
>> > >>>> and
>> > >>>> > > try(this is to check is there any impact with dynamic executor)
>> > >>>> > > for disabling dynamic executor you need set in
>> > >>>> spark-default.conf
>> > >>>> > > --num-executors=100
>> > >>>> > >
>> > >>>> > > 2. Can u please set in below property
>> > enable.blocklet.distribution=f
>> > >>>> alse
>> > >>>> > > and execute the query.
>> > >>>> > >
>> > >>>> > > 3. Cardinality of each column.
>> > >>>> > >
>> > >>>> > > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??
>> > >>>> > >
>> > >>>> > > 5. Can u keep all the columns which is present in filter on
>> left
>> > >>>> side so
>> > >>>> > > less no of blocks will identified during block pruning and it
>> will
>> > >>>> > improve
>> > >>>> > > the query performance.
>> > >>>> > >
>> > >>>> > >
>> > >>>> > > -Regards
>> > >>>> > > Kumar Vishal
>> > >>>> > >
>> > >>>> > > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <[hidden email]>
>> > >>>> wrote:
>> > >>>> > >
>> > >>>> > > > Hi Kumar Vishal,
>> > >>>> > > >
>> > >>>> > > > 1. Create table ddl:
>> > >>>> > > >
>> > >>>> > > > CREATE TABLE IF NOT EXISTS Table1
>> > >>>> > > >
>> > >>>> > > > (* h Int, g Int, d String, f Int, e Int,*
>> > >>>> > > >
>> > >>>> > > > a Int, b Int, …(extra near 300 columns)
>> > >>>> > > >
>> > >>>> > > > STORED BY 'org.apache.carbondata.format'
>> > >>>> > > > TBLPROPERTIES(
>> > >>>> > > >
>> > >>>> > > > "NO_INVERTED_INDEX”=“a”,
>> > >>>> > > > "NO_INVERTED_INDEX”=“b”,
>> > >>>> > > >
>> > >>>> > > > …(extra near 300 columns)
>> > >>>> > > >
>> > >>>> > > > "DICTIONARY_INCLUDE”=“a”,
>> > >>>> > > >
>> > >>>> > > > "DICTIONARY_INCLUDE”=“b”,
>> > >>>> > > >
>> > >>>> > > > …(extra near 300 columns)
>> > >>>> > > >
>> > >>>> > > > )
>> > >>>> > > >
>> > >>>> > > > 2. 3. There more than hundreds node in the cluster, but
>> > >>>> cluster
>> > >>>> > is
>> > >>>> > > > used mixed with other application. Some time when node is
>> > >>>> enough, we
>> > >>>> > > will
>> > >>>> > > > get 100 distinct node.
>> > >>>> > > >
>> > >>>> > > > 4. I give a statistic of task time during once query and
>> > mark
>> > >>>> > > > distinct nodes below:
>> > >>>> > > >
>> > >>>> > > > [image: 内嵌图片 1]
>> > >>>> > > >
>> > >>>> > > >
>> > >>>> > > >
>> > >>>> > > >
>> > >>>> > > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <
>> > >>>> [hidden email]>:
>> > >>>> > > >
>> > >>>> > > >> Hi Anning Luo,
>> > >>>> > > >>
>> > >>>> > > >> Can u please provide below details.
>> > >>>> > > >>
>> > >>>> > > >> 1.Create table ddl.
>> > >>>> > > >> 2.Number of node in you cluster setup.
>> > >>>> > > >> 3. Number of executors per node.
>> > >>>> > > >> 4. Query statistics.
>> > >>>> > > >>
>> > >>>> > > >> Please find my comments in bold.
>> > >>>> > > >>
>> > >>>> > > >> Problem:
>> > >>>> > > >> 1. GC problem. We suffer a 20%~30%
>> GC
>> > >>>> time
>> > >>>> > for
>> > >>>> > > >> some task in first stage after a lot of parameter
>> refinement.
>> > We
>> > >>>> now
>> > >>>> > use
>> > >>>> > > >> G1
>> > >>>> > > >> GC in java8. GC time will double if use CMS. The main GC
>> time
>> > is
>> > >>>> spent
>> > >>>> > > on
>> > >>>> > > >> young generation GC. Almost half memory of young generation
>> > will
>> > >>>> be
>> > >>>> > copy
>> > >>>> > > >> to
>> > >>>> > > >> old generation. It seems lots of object has a long life
>> than GC
>> > >>>> period
>> > >>>> > > and
>> > >>>> > > >> the space is not be reuse(as concurrent GC will release it
>> > >>>> later).
>> > >>>> > When
>> > >>>> > > we
>> > >>>> > > >> use a large Eden(>=1G for example), once GC time will be
>> > >>>> seconds. If
>> > >>>> > set
>> > >>>> > > >> Eden little(256M for example), once GC time will be hundreds
>> > >>>> > > milliseconds,
>> > >>>> > > >> but more frequency and total is still seconds. Is there any
>> way
>> > >>>> to
>> > >>>> > > lessen
>> > >>>> > > >> the GC time? (We don’t consider the first query and second
>> > query
>> > >>>> in
>> > >>>> > this
>> > >>>> > > >> case.)
>> > >>>> > > >>
>> > >>>> > > >> *How many node are present in your cluster setup?? If nodes
>> are
>> > >>>> less
>> > >>>> > > >> please
>> > >>>> > > >> reduce the number of executors per node.*
>> > >>>> > > >>
>> > >>>> > > >> 2. Performance refine problem. Row
>> > >>>> number
>> > >>>> > after
>> > >>>> > > >> being filtered is not uniform. Some node maybe heavy. It
>> spend
>> > >>>> more
>> > >>>> > time
>> > >>>> > > >> than other node. The time of one task is 4s ~ 16s. Is any
>> > method
>> > >>>> to
>> > >>>> > > refine
>> > >>>> > > >> it?
>> > >>>> > > >>
>> > >>>> > > >> 3. Too long time for first and
>> second
>> > >>>> query.
>> > >>>> > I
>> > >>>> > > >> know dictionary and some index need to be loaded for the
>> first
>> > >>>> time.
>> > >>>> > But
>> > >>>> > > >> after I trying use query below to preheat it, it still
>> spend a
>> > >>>> lot of
>> > >>>> > > >> time.
>> > >>>> > > >> How could I preheat the query correctly?
>> > >>>> > > >> select Aarray, a, b, c… from Table1
>> > where
>> > >>>> > Aarray
>> > >>>> > > >> is
>> > >>>> > > >> not null and d = “sss” and e !=22 and f = 33 and g = 44 and
>> h =
>> > >>>> 55
>> > >>>> > > >>
>> > >>>> > > >> *Currently we are working on first time query improvement.
>> For
>> > >>>> now you
>> > >>>> > > can
>> > >>>> > > >> run select count(*) or count(column), so all the blocks get
>> > >>>> loaded and
>> > >>>> > > >> then
>> > >>>> > > >> you can run the actual query.*
>> > >>>> > > >>
>> > >>>> > > >>
>> > >>>> > > >> 4. Any other suggestion to lessen the query
>> time?
>> > >>>> > > >>
>> > >>>> > > >>
>> > >>>> > > >> Some suggestion：
>> > >>>> > > >> The log by class QueryStatisticsRecorder give
>> me a
>> > >>>> good
>> > >>>> > > means
>> > >>>> > > >> to find the neck bottle, but not enough. There still some
>> > metric
>> > >>>> I
>> > >>>> > think
>> > >>>> > > >> is
>> > >>>> > > >> very useful:
>> > >>>> > > >> 1. filter ratio. i.e.. not only result_size but
>> > also
>> > >>>> the
>> > >>>> > > >> origin
>> > >>>> > > >> size so we could know how many data is filtered.
>> > >>>> > > >> 2. IO time. The scan_blocks_time is not enough.
>> If
>> > >>>> it is
>> > >>>> > > high,
>> > >>>> > > >> we know somethings wrong, but not know what cause that
>> problem.
>> > >>>> The
>> > >>>> > real
>> > >>>> > > >> IO
>> > >>>> > > >> time for data is not be provided. As there may be several
>> file
>> > >>>> for one
>> > >>>> > > >> partition, know the program slow is caused by datanode or
>> > >>>> executor
>> > >>>> > > itself
>> > >>>> > > >> give us intuition to find the problem.
>> > >>>> > > >> 3. The TableBlockInfo for task. I log it by
>> myself
>> > >>>> when
>> > >>>> > > >> debugging. It tell me how many blocklets is locality. The
>> spark
>> > >>>> web
>> > >>>> > > >> monitor
>> > >>>> > > >> just give a locality level, but may be only one blocklet is
>> > >>>> locality.
>> > >>>> > > >>
>> > >>>> > > >>
>> > >>>> > > >> -Regards
>> > >>>> > > >> Kumar Vishal
>> > >>>> > > >>
>> > >>>> > > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <[hidden email]
>> >
>> > >>>> wrote:
>> > >>>> > > >>
>> > >>>> > > >> > Hi,
>> > >>>> > > >> >
>> > >>>> > > >> > We are using carbondata to build our table and running
>> query
>> > in
>> > >>>> > > >> > CarbonContext. We have some performance problem during
>> > >>>> refining the
>> > >>>> > > >> system.
>> > >>>> > > >> >
>> > >>>> > > >> > *Background*：
>> > >>>> > > >> >
>> > >>>> > > >> > *cluster*： 100 executor，5
>> task/executor,
>> > >>>> 10G
>> > >>>> > > >> > memory/executor
>> > >>>> > > >> >
>> > >>>> > > >> > *data*: 60+GB(per one
>> replica)
>> > as
>> > >>>> > carbon
>> > >>>> > > >> data
>> > >>>> > > >> > format, 600+MB/file * 100 file, 300+columns, 300+million
>> rows
>> > >>>> > > >> >
>> > >>>> > > >> > *sql example:*
>> > >>>> > > >> >
>> > >>>> > > >> > select A,
>> > >>>> > > >> >
>> > >>>> > > >> > sum(a),
>> > >>>> > > >> >
>> > >>>> > > >> > sum(b),
>> > >>>> > > >> >
>> > >>>> > > >> > sum(c),
>> > >>>> > > >> >
>> > >>>> > > >> > …( extra 100
>> > aggregation
>> > >>>> like
>> > >>>> > > >> > sum(column))
>> > >>>> > > >> >
>> > >>>> > > >> > from Table1 LATERAL
>> > VIEW
>> > >>>> > > >> > explode(split(Aarray, ‘*;*’)) ATable AS A
>> > >>>> > > >> >
>> > >>>> > > >> > where A is not null
>> and
>> > >>>> d >
>> > >>>> > > >> “ab:c-10”
>> > >>>> > > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44
>> GROUP
>> > BY
>> > >>>> A
>> > >>>> > > >> >
>> > >>>> > > >> > *target query time*: <10s
>> > >>>> > > >> >
>> > >>>> > > >> > *current query time*: 15s ~ 25s
>> > >>>> > > >> >
>> > >>>> > > >> > *scene:* OLAP system. <100
>> > queries
>> > >>>> every
>> > >>>> > > >> day.
>> > >>>> > > >> > Concurrency number is not high. Most time cpu is idle, so
>> > this
>> > >>>> > service
>> > >>>> > > >> will
>> > >>>> > > >> > run with other program. The service will run for long
>> time.
>> > We
>> > >>>> could
>> > >>>> > > not
>> > >>>> > > >> > occupy a very large memory for every executor.
>> > >>>> > > >> >
>> > >>>> > > >> > *refine*: I have build index
>> and
>> > >>>> > > >> dictionary on
>> > >>>> > > >> > d, e, f, g, h and build dictionary on all other
>> aggregation
>> > >>>> > > >> columns(i.e. a,
>> > >>>> > > >> > b, c, …100+ columns). And make sure there is one segment
>> for
>> > >>>> total
>> > >>>> > > >> data. I
>> > >>>> > > >> > have open the speculation(quantile=0.5, interval=250,
>> > >>>> > multiplier=1.2).
>> > >>>> > > >> >
>> > >>>> > > >> > Time is mainly spent on first stage before shuffling. As
>> 95%
>> > >>>> data
>> > >>>> > will
>> > >>>> > > >> be
>> > >>>> > > >> > filtered out, the shuffle process spend little time. In
>> first
>> > >>>> stage,
>> > >>>> > > >> most
>> > >>>> > > >> > task complete in less than 10s. But there still be near 50
>> > >>>> tasks
>> > >>>> > > longer
>> > >>>> > > >> > than 10s. Max task time in one query may be 12~16s.
>> > >>>> > > >> >
>> > >>>> > > >> > *Problem:*
>> > >>>> > > >> >
>> > >>>> > > >> > 1. GC problem. We suffer a 20%~30% GC time for
>> some
>> > >>>> task in
>> > >>>> > > >> first
>> > >>>> > > >> > stage after a lot of parameter refinement. We now use G1
>> GC
>> > in
>> > >>>> > java8.
>> > >>>> > > GC
>> > >>>> > > >> > time will double if use CMS. The main GC time is spent on
>> > young
>> > >>>> > > >> generation
>> > >>>> > > >> > GC. Almost half memory of young generation will be copy to
>> > old
>> > >>>> > > >> generation.
>> > >>>> > > >> > It seems lots of object has a long life than GC period and
>> > the
>> > >>>> space
>> > >>>> > > is
>> > >>>> > > >> not
>> > >>>> > > >> > be reuse(as concurrent GC will release it later). When we
>> > use a
>> > >>>> > large
>> > >>>> > > >> > Eden(>=1G for example), once GC time will be seconds. If
>> set
>> > >>>> Eden
>> > >>>> > > >> > little(256M for example), once GC time will be hundreds
>> > >>>> > milliseconds,
>> > >>>> > > >> but
>> > >>>> > > >> > more frequency and total is still seconds. Is there any
>> way
>> > to
>> > >>>> > lessen
>> > >>>> > > >> the
>> > >>>> > > >> > GC time? (We don’t consider the first query and second
>> query
>> > >>>> in this
>> > >>>> > > >> case.)
>> > >>>> > > >> >
>> > >>>> > > >> > 2. Performance refine problem. Row number after
>> > being
>> > >>>> > > >> filtered is
>> > >>>> > > >> > not uniform. Some node maybe heavy. It spend more time
>> than
>> > >>>> other
>> > >>>> > > node.
>> > >>>> > > >> The
>> > >>>> > > >> > time of one task is 4s ~ 16s. Is any method to refine it?
>> > >>>> > > >> >
>> > >>>> > > >> > 3. Too long time for first and second query. I
>> know
>> > >>>> > > dictionary
>> > >>>> > > >> > and some index need to be loaded for the first time. But
>> > after
>> > >>>> I
>> > >>>> > > trying
>> > >>>> > > >> use
>> > >>>> > > >> > query below to preheat it, it still spend a lot of time.
>> How
>> > >>>> could I
>> > >>>> > > >> > preheat the query correctly?
>> > >>>> > > >> >
>> > >>>> > > >> > select Aarray, a, b, c… from Table1 where
>> > Aarray
>> > >>>> is
>> > >>>> > not
>> > >>>> > > >> null
>> > >>>> > > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
>> > >>>> > > >> >
>> > >>>> > > >> > 4. Any other suggestion to lessen the query
>> time?
>> > >>>> > > >> >
>> > >>>> > > >> >
>> > >>>> > > >> >
>> > >>>> > > >> > Some suggestion：
>> > >>>> > > >> >
>> > >>>> > > >> > The log by class QueryStatisticsRecorder give
>> me
>> > a
>> > >>>> good
>> > >>>> > > >> means
>> > >>>> > > >> > to find the neck bottle, but not enough. There still some
>> > >>>> metric I
>> > >>>> > > >> think is
>> > >>>> > > >> > very useful:
>> > >>>> > > >> >
>> > >>>> > > >> > 1. filter ratio. i.e.. not only result_size
>> but
>> > >>>> also the
>> > >>>> > > >> origin
>> > >>>> > > >> > size so we could know how many data is filtered.
>> > >>>> > > >> >
>> > >>>> > > >> > 2. IO time. The scan_blocks_time is not
>> enough.
>> > If
>> > >>>> it is
>> > >>>> > > >> high,
>> > >>>> > > >> > we know somethings wrong, but not know what cause that
>> > >>>> problem. The
>> > >>>> > > >> real IO
>> > >>>> > > >> > time for data is not be provided. As there may be several
>> > file
>> > >>>> for
>> > >>>> > one
>> > >>>> > > >> > partition, know the program slow is caused by datanode or
>> > >>>> executor
>> > >>>> > > >> itself
>> > >>>> > > >> > give us intuition to find the problem.
>> > >>>> > > >> >
>> > >>>> > > >> > 3. The TableBlockInfo for task. I log it by
>> > myself
>> > >>>> when
>> > >>>> > > >> > debugging. It tell me how many blocklets is locality. The
>> > >>>> spark web
>> > >>>> > > >> monitor
>> > >>>> > > >> > just give a locality level, but may be only one blocklet
>> is
>> > >>>> > locality.
>> > >>>> > > >> >
>> > >>>> > > >> >
>> > >>>> > > >> > ---------------------
>> > >>>> > > >> >
>> > >>>> > > >> > Anning Luo
>> > >>>> > > >> >
>> > >>>> > > >> > *HULU*
>> > >>>> > > >> >
>> > >>>> > > >> > Email: [hidden email]
>> > >>>> > > >> >
>> > >>>> > > >> > [hidden email]
>> > >>>> > > >> >
>> > >>>> > > >>
>> > >>>> > > >
>> > >>>> > > >
>> > >>>> > >
>> > >>>> >
>> > >>>>
>> > >>>
>> > >>>
>> > >>
>> > >
>> >
>>
>
>

... [show rest of quote]

kumar vishal

Anning Luo-2

Nov 17, 2016; 8:59am

Re: GC problem and performance refine problem

14 posts

Hi Kumar Vishal,

I redo some experiment with a detail driver log. The logs are in the
accessory. And every log run the same query for 10 times.

1. “driverlog1” is under same condition as previous I did.

2. “driverlog2”: in this experiment, before load data, I shuffle the
csv file to make the rows with same attribute will distribute uniform
between all csv files. i.e. if we use the query filter on the origin csv
file, the result quantity is almost same for every csv file. And every csv
file is one block. So every partition for the loader of carbondata has
uniform data. That method works well:

Time cost of first experiment: 103543, 49363, 125430, 22138, 21626,
27905, 22086, 21332, 23501, 16715

Time cost of second experiment:62466, 38709, 24700, 18767, 18325, 15887,
16014, 15452, 17440, 15563

3. “driverlog3”: First, use the shuffled csv data as 2th experiment.
Second, modify the create table sql:

*OLD CREATE TABLE:*

CREATE TABLE IF NOT EXISTS Table1

(* h Int, g Int, d String, f Int, e Int,*

a Int, b Int, …(extra near 300 columns)

STORED BY 'org.apache.carbondata.format'TBLPROPERTIES(

"NO_INVERTED_INDEX”=“a”,"NO_INVERTED_INDEX”=“b”,

…(extra near 300 columns)

"DICTIONARY_INCLUDE”=“a”,

"DICTIONARY_INCLUDE”=“b”,

…(extra near 300 columns)

)

*NEW CREATE TABLE:*

CREATE TABLE IF NOT EXISTS Table1

(* h Int, g Int, f Int, e Int, d String*

a Int, b Int, …(extra near 300 columns)

STORED BY 'org.apache.carbondata.format' TBLPROPERTIES(

DICTIONARY_INCLUDE"="h, g, f , e , d"

)

The h,g,f,e,d are the columns used in filter. And their distinct value
increase.

Time cost of third experiment: 57088, 37105, 41588, 43540, 38143, 35581,
34040, 35886, 34610, 32589

The time cost of third experiment is longer than others. It is because that
I use the “h=66 and g = 67 and f = 12 and e != 5 and d >’aaaa’ and d <
‘bbbb’” as the filter condition. So considering with the mdk build order,
the result rows will be continuous in the file. So there is some blocket
matching the filter condition totally, while others matching zero row. But
the scheduler on driver did not filter out the zero row matched blockets.
It seems the min/max filter does not work correctly on driver side when
getSplit from InputFormat. There are some case on executor(blocket size is
120k):

+------------------+----------------+--------------------+--
--------------+---------------+-----------+-------------------+

| task_id|load_blocks_time|load_dictionary_time|scan_blocks_
time|scan_blocks_num|result_size|total_executor_time|

+------------------+----------------+--------------------+--
--------------+---------------+-----------+-------------------+

|603288369228186_62| 426 | 2 | 28
| 3 | 0 | 470 |

+------------------+----------------+--------------------+--
--------------+---------------+-----------+-------------------+

16/11/16 21:36:25 INFO impl.FilterScanner: pool-30-thread-1 [INDEX] index
filter : 113609/120000

+-------------------+----------------+--------------------+-
---------------+---------------+-----------+-------------------+

| task_id|load_blocks_time|load_dictionary_time|scan_blocks_
time|scan_blocks_num|result_size|total_executor_time|

+-------------------+----------------+--------------------+-
---------------+---------------+-----------+-------------------+

|603288369228186_428| 380 | 4 | 661
| 3 | 113609 | 13634 |

+-------------------+----------------+--------------------+-
---------------+---------------+-----------+-------------------+

In first case result_size is zero, but the blocket is still put in one
task. That make some task do nothing, but others suffer a heavy work.

In second case, example only one blocket has data(4 or 6 blockets for one
task). I add the log about invert index filter ratio like “[INDEX] index
filter : <result count>/<total block count>”

So, how could I make the min/max filter work correctly in driver side?

Another question: use multi-line in TBLPROPERTIES dose not work correctly
like “"DICTIONARY_INCLUDE”=“a”, "DICTIONARY_INCLUDE”=“b”” in the old create
table sql. Only the column in the last DICTIONARY_INCLUDE declare is add
into the dimension. The new one works correctly. But the old way did not
throw any exception.

Further, I think from the 2th experiment balancing the data is important.
So I will change the blocket size to 2k rows if the min/max filter could
work on driver side. I have not changed the int type to double type for
measure, I will did it later.

2016-11-17 16:34 GMT+08:00 An Lan <[hidden email]>:

> Hi Kumar Vishal,
>
>
>
> I redo some experiment with a detail driver log. The logs are in the
> accessory. And every log run the same query for 10 times.
>
>
>
> 1. “driverlog1” is under same condition as previous I did.
>
> 2. “driverlog2”: in this experiment, before load data, I shuffle
> the csv file to make the rows with same attribute will distribute uniform
> between all csv files. i.e. if we use the query filter on the origin csv
> file, the result quantity is almost same for every csv file. And every csv
> file is one block. So every partition for the loader of carbondata has
> uniform data. That method works well:
>
> Time cost of first experiment: 103543, 49363, 125430, 22138,
> 21626, 27905, 22086, 21332, 23501, 16715
>
> Time cost of second experiment:62466, 38709, 24700, 18767, 18325, 15887,
> 16014, 15452, 17440, 15563
>
> 3. “driverlog3”: First, use the shuffled csv data as 2th
> experiment. Second, modify the create table sql:
>
>
>
> *OLD CREATE TABLE:*
>
> CREATE TABLE IF NOT EXISTS Table1
>
> (* h Int, g Int, d String, f Int, e Int,*
>
> a Int, b Int, …(extra near 300 columns)
>
> STORED BY 'org.apache.carbondata.format'TBLPROPERTIES(
>
> "NO_INVERTED_INDEX”=“a”,"NO_INVERTED_INDEX”=“b”,
>
> …(extra near 300 columns)
>
> "DICTIONARY_INCLUDE”=“a”,
>
> "DICTIONARY_INCLUDE”=“b”,
>
> …(extra near 300 columns)
>
> )
>
>
>
> *NEW CREATE TABLE:*
>
> CREATE TABLE IF NOT EXISTS Table1
>
> (* h Int, g Int, f Int, e Int, d String*
>
> a Int, b Int, …(extra near 300 columns)
>
> STORED BY 'org.apache.carbondata.format' TBLPROPERTIES(
>
> DICTIONARY_INCLUDE"="h, g, f , e , d"
>
> )
>
> The h,g,f,e,d are the columns used in filter. And their distinct value
> increase.
>
> Time cost of third experiment: 57088, 37105, 41588, 43540, 38143, 35581,
> 34040, 35886, 34610, 32589
>
> The time cost of third experiment is longer than others. It is because
> that I use the “h=66 and g = 67 and f = 12 and e != 5 and d >’aaaa’ and d <
> ‘bbbb’” as the filter condition. So considering with the mdk build order,
> the result rows will be continuous in the file. So there is some blocket
> matching the filter condition totally, while others matching zero row. But
> the scheduler on driver did not filter out the zero row matched blockets.
> It seems the min/max filter does not work correctly on driver side when
> getSplit from InputFormat. There are some case on executor(blocket size is
> 120k):
>
> +------------------+----------------+--------------------+--
> --------------+---------------+-----------+-------------------+
>
> | task_id|load_blocks_time|load_dictionary_time|scan_blocks_
> time|scan_blocks_num|result_size|total_executor_time|
>
> +------------------+----------------+--------------------+--
> --------------+---------------+-----------+-------------------+
>
> |603288369228186_62| 426 | 2 | 28
> | 3 | 0 | 470 |
>
> +------------------+----------------+--------------------+--
> --------------+---------------+-----------+-------------------+
>
>
>
> 16/11/16 21:36:25 INFO impl.FilterScanner: pool-30-thread-1 [INDEX] index
> filter : 113609/120000
>
> +-------------------+----------------+--------------------+-
> ---------------+---------------+-----------+-------------------+
>
> | task_id|load_blocks_time|load_dictionary_time|scan_blocks_
> time|scan_blocks_num|result_size|total_executor_time|
>
> +-------------------+----------------+--------------------+-
> ---------------+---------------+-----------+-------------------+
>
> |603288369228186_428| 380 | 4 |
> 661 | 3 | 113609 | 13634 |
>
> +-------------------+----------------+--------------------+-
> ---------------+---------------+-----------+-------------------+
>
> In first case result_size is zero, but the blocket is still put in one
> task. That make some task do nothing, but others suffer a heavy work.
>
> In second case, example only one blocket has data(4 or 6 blockets for one
> task). I add the log about invert index filter ratio like “[INDEX] index
> filter : <result count>/<total block count>”
>
> So, how could I make the min/max filter work correctly in driver side?
>
>
>
> Another question: use multi-line in TBLPROPERTIES dose not work correctly
> like “"DICTIONARY_INCLUDE”=“a”, "DICTIONARY_INCLUDE”=“b”” in the old
> create table sql. Only the column in the last DICTIONARY_INCLUDE declare is
> add into the dimension. The new one works correctly. But the old way did
> not throw any exception.
>
>
>
> Further, I think from the 2th experiment balancing the data is important.
> So I will change the blocket size to 2k rows if the min/max filter could
> work on driver side. I have not changed the int type to double type for
> measure, I will did it later.
>
> 2016-11-16 21:55 GMT+08:00 Kumar Vishal <[hidden email]>:
>
>> Hi An Lan,
>> Please go through discussion *how to create carbon table for better query
>> performance* shared by Bill. I am coping the same in this mail for you
>> reference.
>>
>>
>> Discussion how to crate the CarbonData table with good performance
>> Suggestion to create Carbon table
>> Recently we used CarbonData to do the performance in Telecommunication
>> filed
>> and summarize some of the Suggestions while creating the CarbonData
>> table.We
>> have tables which range from 10 thousand rows to 10 billion rows and have
>> from 100 columns to 300 columns. Following are some of the columns used in
>> the table.
>> Column name Data type Cardinality Attribution
>> msisdn String 30 million dimension
>> BEGIN_TIME bigint 10 thousand dimension
>> HOST String 1 million dimension
>> Dime_1 String 1 thousand dimension
>> counter_1 numeric(20,0) NA measure
>> ... ... NA ...
>> counter_100 numeric(20,0) NA measure
>> We have about more than 50 test cases; according to the test case we
>> summarize some suggestion to create the table which can have a better
>> query
>> performance. 1. Put the frequently-used column filter in the beginning.
>> For
>> example, MSISDN filter is used in most of the query then put the MSISDN in
>> the first column. The create table command can be as follows, the query
>> which has MSISDN as a filter will be good (because the MSISDN is high
>> cardinality, if create table like this the compress ratio will be
>> decreased)/create table carbondata_table(//msisdn String,//...//)STORED BY
>> 'org.apache.carbondata.format' //TBLPROPERTIES (
>> 'DICTIONARY_EXCLUDE'='MSISDN,..','DICTIONARY_INCLUDE'='...');/2.
>> If
>> has
>> multiple column which is frequently-use in the filter, put it to the front
>> in the order as low cardinality to high cardinality.For example if msisdn,
>> host and dime_1 is frequently-used column, the table column order can be
>> like dime_1->host->msisdn, because the dime_1 cardinality is low. Create
>> table command can be as follows. This will increase the compression ratio
>> and good performance for filter on dime_1, host and msisdn. /create table
>> carbondata_table(Dime_1 String,HOST String,MSISDN String,...)STORED BY
>> 'org.apache.carbondata.format' /TBLPROPERTIES (
>> 'DICTIONARY_EXCLUDE'='MSISDN,HOST..','DICTIONARY_INCLUDE'='Dime_1..');3.
>> If
>> no column is frequent-use in filter, then can put all the dimension column
>> order as from low cardinality to high cardinality. Create table command
>> can
>> be as following: /create table carbondata_table(Dime_1 String,BEGIN_TIME
>> bigintHOST String,MSISDN String,...)STORED BY
>> 'org.apache.carbondata.format'
>> TBLPROPERTIES (
>> 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI..','DICTIONARY_
>> INCLUDE'='Dime_1,END_TIME,BEGIN_TIME..');/4.
>> For measure that needs no high accuracy, then no need to use numeric(20,0)
>> data type, suggestion is to use double to replace it than will increase
>> the
>> query performance. If one test case uses double to replace the numeric
>> (20,
>> 0) the query improve 5 times from 15 second to 3 second. Create table
>> command can be as follows. /create table carbondata_table(Dime_1
>> String,BEGIN_TIME bigintHOST String,MSISDN String,counter_1
>> double,counter_2
>> double,...counter_100 double,)STORED BY 'org.apache.carbondata.format'
>> TBLPROPERTIES (
>> 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI','DICTIONARY_
>> INCLUDE'='Dime_1,END_TIME,BEGIN_TIME');/5.
>> If the column which is always incremental like start_time. For example one
>> scenario: every day we will load data into carbon and the start_time is
>> incremental for each load. For this scenario you can put the start_time
>> column in the back of dimension, because always incremental value can use
>> the min/max index well always. Create table command can be as following.
>> /create table carbondata_table(Dime_1 String,HOST String,MSISDN
>> String,counter_1 double,counter_2 double,BEGIN_TIME bigint,...counter_100
>> double,)STORED BY 'org.apache.carbondata.format' TBLPROPERTIES (
>> 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI','DICTIONARY_
>> INCLUDE'='Dime_1,END_TIME,BEGIN_TIME');/One
>> more is for the dimension whether dictionary is needed or not, we suggest
>> if
>> the cardinality higher than 50 thousand do not put it as dictionary
>> column.
>> If high cardinality column put as dictionary will impact the load
>> performance.
>>
>> -Regards
>> Kumar Vishal
>>
>> On Wed, Nov 16, 2016 at 7:19 PM, Kumar Vishal <[hidden email]>
>> wrote:
>>
>> > Hi An Lan,
>> > For detail logging you need to add log4j configuration in
>> > spark-default.conf for both driver and executor.
>> > * spark.driver.extraJavaOption = -Dlog4j.configuration=file:/<f
>> ilepath>*
>> > * spark.executor.extraJavaOption = -Dlog4j.configuration=file:/<f
>> ilepath>*
>> >
>> > Please make sure in* log4j.properties* *log4j.rootCategory* is *INFO*
>> >
>> > -Regards
>> > Kumar Vishal
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Nov 16, 2016 at 2:06 PM, An Lan <[hidden email]> wrote:
>> >
>> >> Hi Kumar Vishal,
>> >>
>> >> Thanks for your suggestion.
>> >> The driver log not contain block distribution log by default. How
>> could I
>> >> open it?
>> >> And how does the order of the dimensions be decided?
>> >>
>> >> 2016-11-16 15:14 GMT+08:00 Kumar Vishal <[hidden email]>:
>> >>
>> >> > Hi An Lan,
>> >> > Data is already distributed, in this case may be one
>> >> > blocklet is returning more number of rows and other returning less
>> >> because
>> >> > of this some task will take more time.
>> >> >
>> >> > In driver log block distribution log is not present, so it is not
>> clear
>> >> > whether it is going for block distribution or blocklet distribution.
>> >> > distribution, can you please add the detail driver log.
>> >> >
>> >> > *Some suggestion:*
>> >> >
>> >> > 1. If column is String and you are not applying filter then in create
>> >> > statement add it in dictionary exclude this will avoid lookup in
>> >> > dictionary. If number of String column are less then better add it in
>> >> > dictionary exclude.
>> >> >
>> >> > 2. If column is a numeric column and you are not applying filter
>> then no
>> >> > need to add in dictionary include or exclude, so it will be a measure
>> >> > column.
>> >> >
>> >> > 3. Currently in carbon for measure column if you will create as a
>> double
>> >> > data type it will give more compression as currently value
>> compression
>> >> mode
>> >> > is support for double data type.
>> >> >
>> >> > -Regards
>> >> > Kumar Vishal
>> >> >
>> >> >
>> >> > On Wed, Nov 16, 2016 at 8:12 AM, An Lan <[hidden email]> wrote:
>> >> >
>> >> > > Hi Kumar Vishal,
>> >> > >
>> >> > > 1. I found the quantity of rows filtered out by invert index is not
>> >> > > uniform between different tasks and the difference is large. Some
>> task
>> >> > may
>> >> > > be 3~4k row after filtered, but the longer tasks may be 3~4w. When
>> >> most
>> >> > > longer task on same node, time cost will be more longer than
>> others.
>> >> So,
>> >> > is
>> >> > > there any way to balance the data and make rows distribute uniform
>> >> > between
>> >> > > task?
>> >> > > Now, the blocklet size is still 120k rows. 3k+ blocket in total.
>> Every
>> >> > > task has 6 blocket.
>> >> > > 2. I upload some logs last time. Is there some access problem about
>> >> it?
>> >> > Is
>> >> > > there any suggest or question about it?
>> >> > >
>> >> > > 2016-11-14 11:29 GMT+08:00 An Lan <[hidden email]>:
>> >> > >
>> >> > >> Hi Kumar Vishal,
>> >> > >>
>> >> > >>
>> >> > >> Driver and some executor logs are in the accessory. The same query
>> >> run
>> >> > >> for five times.
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> Time consume for every query:
>> >> > >>
>> >> > >> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
>> >> > >>
>> >> > >>
>> >> > >> The first stage for every query:
>> >> > >>
>> >> > >> (first query -> start from 0 stage
>> >> > >>
>> >> > >> second query -> 4
>> >> > >>
>> >> > >> third query -> 8
>> >> > >>
>> >> > >> fourth query -> 12
>> >> > >>
>> >> > >> five query -> 16)
>> >> > >>
>> >> > >> see "first stage of queries.png" in accessory
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> Longest task for every query:
>> >> > >>
>> >> > >> (Task and log relationship:
>> >> > >>
>> >> > >> stage 0, task 249 -> 1.log
>> >> > >>
>> >> > >> stage 4, task 420 -> 2.log
>> >> > >>
>> >> > >> stage 8, task 195 -> 3.log
>> >> > >>
>> >> > >> stage 12, task 350 -> 4.log
>> >> > >>
>> >> > >> stage 16, task 321 -> 5.log)
>> >> > >> see "longest task.png" in accessory
>> >> > >>
>> >> > >>
>> >> > >> longest tasks.png
>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7dGY4cF9jRDk1QTg/
>> >> > view?usp=drive_web>
>> >> > >>
>> >> > >> first stage of queries.png
>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7SVZsNnR2VEw2X0k/
>> >> > view?usp=drive_web>
>> >> > >>
>> >> > >> dirverlog
>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7VllnVkNnenhyTTA/
>> >> > view?usp=drive_web>
>> >> > >>
>> >> > >> 1.log
>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7eUdUWmlia1J4bk0/
>> >> > view?usp=drive_web>
>> >> > >>
>> >> > >> 2.log
>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ZUlGZHZVQ3phdmM/
>> >> > view?usp=drive_web>
>> >> > >>
>> >> > >> 3.log
>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7UHpBQzREX3N5aEk/
>> >> > view?usp=drive_web>
>> >> > >>
>> >> > >> 4.log
>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7NVctMmYwNldCVEk/
>> >> > view?usp=drive_web>
>> >> > >>
>> >> > >> 5.log
>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ODhHNE5sSGNfOVE/
>> >> > view?usp=drive_web>
>> >> > >>
>> >> > >>
>> >> > >> 2016-11-14 11:22 GMT+08:00 An Lan <[hidden email]>:
>> >> > >>
>> >> > >>> Hi,
>> >> > >>>
>> >> > >>>
>> >> > >>>
>> >> > >>> Driver and some executor logs are in the accessory. The same
>> query
>> >> run
>> >> > >>> for five times.
>> >> > >>>
>> >> > >>>
>> >> > >>>
>> >> > >>> Time consume for every query:
>> >> > >>>
>> >> > >>> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
>> >> > >>>
>> >> > >>>
>> >> > >>>
>> >> > >>> The first stage for every query:
>> >> > >>>
>> >> > >>> (first query -> start from 0 stage
>> >> > >>>
>> >> > >>> second query -> 4
>> >> > >>>
>> >> > >>> third query -> 8
>> >> > >>>
>> >> > >>> fourth query -> 12
>> >> > >>>
>> >> > >>> five query -> 16)
>> >> > >>>
>> >> > >>>
>> >> > >>> Longest task for every query:
>> >> > >>>
>> >> > >>> (Task and log relationship:
>> >> > >>>
>> >> > >>> stage 0, task 249 -> 1.log
>> >> > >>>
>> >> > >>> stage 4, task 420 -> 2.log
>> >> > >>>
>> >> > >>> stage 8, task 195 -> 3.log
>> >> > >>>
>> >> > >>> stage 12, task 350 -> 4.log
>> >> > >>>
>> >> > >>> stage 16, task 321 -> 5.log)
>> >> > >>>
>> >> > >>>
>> >> > >>>
>> >> > >>> 2016-11-11 20:25 GMT+08:00 Kumar Vishal <
>> [hidden email]
>> >> >:
>> >> > >>>
>> >> > >>>> What is the time difference between first time and second time
>> >> query
>> >> > as
>> >> > >>>> second time it will read from os cache,so i think there wont be
>> >> any IO
>> >> > >>>> bottleneck.
>> >> > >>>>
>> >> > >>>> Can u provide driver log and executor log for task which are
>> taking
>> >> > more
>> >> > >>>> time.
>> >> > >>>>
>> >> > >>>>
>> >> > >>>> -Regards
>> >> > >>>> Kumar Vishal
>> >> > >>>>
>> >> > >>>> On Fri, Nov 11, 2016 at 3:21 PM, An Lan <[hidden email]>
>> >> wrote:
>> >> > >>>>
>> >> > >>>> > 1. I have set --num-executors=100 in all test. The image write
>> >> 100
>> >> > >>>> nodes
>> >> > >>>> > means one executor for one node, and when there is 64 node,
>> there
>> >> > >>>> maybe two
>> >> > >>>> > or three executor on one node. The total executor number is
>> >> always
>> >> > >>>> 100.
>> >> > >>>> >
>> >> > >>>> > 2. I do not find the property enable.blocklet.distribution in
>> >> > 0.1.1. I
>> >> > >>>> > found it in 0.2.0. I am testing on 0.1.1, and will try it
>> later.
>> >> If
>> >> > >>>> you
>> >> > >>>> > concern locality level, it seems that the most long query is
>> not
>> >> > >>>> caused by
>> >> > >>>> > IO problem. Statistic of the long task statistic like this:
>> >> > >>>> >
>> >> > >>>> > +--------------------+----------------+--------------------+
>> >> > >>>> > ----------------+---------------+-----------+---------------
>> >> ----+
>> >> > >>>> > | task_id|load_blocks_time|load
>> >> > >>>> _dictionary_time|scan_blocks_
>> >> > >>>> > time|scan_blocks_num|result_size|total_executor_time|
>> >> > >>>> > +--------------------+----------------+--------------------+
>> >> > >>>> > ----------------+---------------+-----------+---------------
>> >> ----+
>> >> > >>>> > |4199754456147478_134| 1 | 32 |
>> >> > >>>> > 3306 | 3 | 42104 | 13029 |
>> >> > >>>> > +--------------------+----------------+--------------------+
>> >> > >>>> > ----------------+---------------+-----------+---------------
>> >> ----+
>> >> > >>>> >
>> >> > >>>> > I have suffer from IO, but after configure speculation, most
>> >> tasks
>> >> > >>>> > with a long IO will be resend.
>> >> > >>>> >
>> >> > >>>> >
>> >> > >>>> > 3. distinct value:
>> >> > >>>> >
>> >> > >>>> > g: 11
>> >> > >>>> >
>> >> > >>>> > h: 3
>> >> > >>>> >
>> >> > >>>> > f: 7
>> >> > >>>> >
>> >> > >>>> > e: 3
>> >> > >>>> >
>> >> > >>>> > d: 281
>> >> > >>>> >
>> >> > >>>> > 4. In the query, only e, f ,g, h, d and A are used in filter,
>> >> others
>> >> > >>>> are
>> >> > >>>> > not used. So I think others used in the aggregation are no
>> need
>> >> > added
>> >> > >>>> for
>> >> > >>>> > index.
>> >> > >>>> >
>> >> > >>>> > 5. I have write the e, f, g, h, d in most left just after
>> "create
>> >> > >>>> table..."
>> >> > >>>> >
>> >> > >>>> > 2016-11-11 15:08 GMT+08:00 Kumar Vishal <
>> >> [hidden email]
>> >> > >:
>> >> > >>>> >
>> >> > >>>> > > Hi An Lan,
>> >> > >>>> > >
>> >> > >>>> > > Please confirm below things.
>> >> > >>>> > >
>> >> > >>>> > > 1. Is dynamic executor is enabled?? If it is enabled can u
>> >> > disabled
>> >> > >>>> and
>> >> > >>>> > > try(this is to check is there any impact with dynamic
>> executor)
>> >> > >>>> > > for disabling dynamic executor you need set in
>> >> > >>>> spark-default.conf
>> >> > >>>> > > --num-executors=100
>> >> > >>>> > >
>> >> > >>>> > > 2. Can u please set in below property
>> >> > enable.blocklet.distribution=f
>> >> > >>>> alse
>> >> > >>>> > > and execute the query.
>> >> > >>>> > >
>> >> > >>>> > > 3. Cardinality of each column.
>> >> > >>>> > >
>> >> > >>>> > > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a” ??
>> >> > >>>> > >
>> >> > >>>> > > 5. Can u keep all the columns which is present in filter on
>> >> left
>> >> > >>>> side so
>> >> > >>>> > > less no of blocks will identified during block pruning and
>> it
>> >> will
>> >> > >>>> > improve
>> >> > >>>> > > the query performance.
>> >> > >>>> > >
>> >> > >>>> > >
>> >> > >>>> > > -Regards
>> >> > >>>> > > Kumar Vishal
>> >> > >>>> > >
>> >> > >>>> > > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <
>> [hidden email]>
>> >> > >>>> wrote:
>> >> > >>>> > >
>> >> > >>>> > > > Hi Kumar Vishal,
>> >> > >>>> > > >
>> >> > >>>> > > > 1. Create table ddl:
>> >> > >>>> > > >
>> >> > >>>> > > > CREATE TABLE IF NOT EXISTS Table1
>> >> > >>>> > > >
>> >> > >>>> > > > (* h Int, g Int, d String, f Int, e Int,*
>> >> > >>>> > > >
>> >> > >>>> > > > a Int, b Int, …(extra near 300 columns)
>> >> > >>>> > > >
>> >> > >>>> > > > STORED BY 'org.apache.carbondata.format'
>> >> > >>>> > > > TBLPROPERTIES(
>> >> > >>>> > > >
>> >> > >>>> > > > "NO_INVERTED_INDEX”=“a”,
>> >> > >>>> > > > "NO_INVERTED_INDEX”=“b”,
>> >> > >>>> > > >
>> >> > >>>> > > > …(extra near 300 columns)
>> >> > >>>> > > >
>> >> > >>>> > > > "DICTIONARY_INCLUDE”=“a”,
>> >> > >>>> > > >
>> >> > >>>> > > > "DICTIONARY_INCLUDE”=“b”,
>> >> > >>>> > > >
>> >> > >>>> > > > …(extra near 300 columns)
>> >> > >>>> > > >
>> >> > >>>> > > > )
>> >> > >>>> > > >
>> >> > >>>> > > > 2. 3. There more than hundreds node in the cluster,
>> but
>> >> > >>>> cluster
>> >> > >>>> > is
>> >> > >>>> > > > used mixed with other application. Some time when node is
>> >> > >>>> enough, we
>> >> > >>>> > > will
>> >> > >>>> > > > get 100 distinct node.
>> >> > >>>> > > >
>> >> > >>>> > > > 4. I give a statistic of task time during once query
>> and
>> >> > mark
>> >> > >>>> > > > distinct nodes below:
>> >> > >>>> > > >
>> >> > >>>> > > > [image: 内嵌图片 1]
>> >> > >>>> > > >
>> >> > >>>> > > >
>> >> > >>>> > > >
>> >> > >>>> > > >
>> >> > >>>> > > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <
>> >> > >>>> [hidden email]>:
>> >> > >>>> > > >
>> >> > >>>> > > >> Hi Anning Luo,
>> >> > >>>> > > >>
>> >> > >>>> > > >> Can u please provide below details.
>> >> > >>>> > > >>
>> >> > >>>> > > >> 1.Create table ddl.
>> >> > >>>> > > >> 2.Number of node in you cluster setup.
>> >> > >>>> > > >> 3. Number of executors per node.
>> >> > >>>> > > >> 4. Query statistics.
>> >> > >>>> > > >>
>> >> > >>>> > > >> Please find my comments in bold.
>> >> > >>>> > > >>
>> >> > >>>> > > >> Problem:
>> >> > >>>> > > >> 1. GC problem. We suffer a
>> 20%~30%
>> >> GC
>> >> > >>>> time
>> >> > >>>> > for
>> >> > >>>> > > >> some task in first stage after a lot of parameter
>> >> refinement.
>> >> > We
>> >> > >>>> now
>> >> > >>>> > use
>> >> > >>>> > > >> G1
>> >> > >>>> > > >> GC in java8. GC time will double if use CMS. The main GC
>> >> time
>> >> > is
>> >> > >>>> spent
>> >> > >>>> > > on
>> >> > >>>> > > >> young generation GC. Almost half memory of young
>> generation
>> >> > will
>> >> > >>>> be
>> >> > >>>> > copy
>> >> > >>>> > > >> to
>> >> > >>>> > > >> old generation. It seems lots of object has a long life
>> >> than GC
>> >> > >>>> period
>> >> > >>>> > > and
>> >> > >>>> > > >> the space is not be reuse(as concurrent GC will release
>> it
>> >> > >>>> later).
>> >> > >>>> > When
>> >> > >>>> > > we
>> >> > >>>> > > >> use a large Eden(>=1G for example), once GC time will be
>> >> > >>>> seconds. If
>> >> > >>>> > set
>> >> > >>>> > > >> Eden little(256M for example), once GC time will be
>> hundreds
>> >> > >>>> > > milliseconds,
>> >> > >>>> > > >> but more frequency and total is still seconds. Is there
>> any
>> >> way
>> >> > >>>> to
>> >> > >>>> > > lessen
>> >> > >>>> > > >> the GC time? (We don’t consider the first query and
>> second
>> >> > query
>> >> > >>>> in
>> >> > >>>> > this
>> >> > >>>> > > >> case.)
>> >> > >>>> > > >>
>> >> > >>>> > > >> *How many node are present in your cluster setup?? If
>> nodes
>> >> are
>> >> > >>>> less
>> >> > >>>> > > >> please
>> >> > >>>> > > >> reduce the number of executors per node.*
>> >> > >>>> > > >>
>> >> > >>>> > > >> 2. Performance refine problem.
>> Row
>> >> > >>>> number
>> >> > >>>> > after
>> >> > >>>> > > >> being filtered is not uniform. Some node maybe heavy. It
>> >> spend
>> >> > >>>> more
>> >> > >>>> > time
>> >> > >>>> > > >> than other node. The time of one task is 4s ~ 16s. Is any
>> >> > method
>> >> > >>>> to
>> >> > >>>> > > refine
>> >> > >>>> > > >> it?
>> >> > >>>> > > >>
>> >> > >>>> > > >> 3. Too long time for first and
>> >> second
>> >> > >>>> query.
>> >> > >>>> > I
>> >> > >>>> > > >> know dictionary and some index need to be loaded for the
>> >> first
>> >> > >>>> time.
>> >> > >>>> > But
>> >> > >>>> > > >> after I trying use query below to preheat it, it still
>> >> spend a
>> >> > >>>> lot of
>> >> > >>>> > > >> time.
>> >> > >>>> > > >> How could I preheat the query correctly?
>> >> > >>>> > > >> select Aarray, a, b, c… from
>> Table1
>> >> > where
>> >> > >>>> > Aarray
>> >> > >>>> > > >> is
>> >> > >>>> > > >> not null and d = “sss” and e !=22 and f = 33 and g = 44
>> and
>> >> h =
>> >> > >>>> 55
>> >> > >>>> > > >>
>> >> > >>>> > > >> *Currently we are working on first time query
>> improvement.
>> >> For
>> >> > >>>> now you
>> >> > >>>> > > can
>> >> > >>>> > > >> run select count(*) or count(column), so all the blocks
>> get
>> >> > >>>> loaded and
>> >> > >>>> > > >> then
>> >> > >>>> > > >> you can run the actual query.*
>> >> > >>>> > > >>
>> >> > >>>> > > >>
>> >> > >>>> > > >> 4. Any other suggestion to lessen the query
>> >> time?
>> >> > >>>> > > >>
>> >> > >>>> > > >>
>> >> > >>>> > > >> Some suggestion：
>> >> > >>>> > > >> The log by class QueryStatisticsRecorder give
>> >> me a
>> >> > >>>> good
>> >> > >>>> > > means
>> >> > >>>> > > >> to find the neck bottle, but not enough. There still some
>> >> > metric
>> >> > >>>> I
>> >> > >>>> > think
>> >> > >>>> > > >> is
>> >> > >>>> > > >> very useful:
>> >> > >>>> > > >> 1. filter ratio. i.e.. not only result_size
>> but
>> >> > also
>> >> > >>>> the
>> >> > >>>> > > >> origin
>> >> > >>>> > > >> size so we could know how many data is filtered.
>> >> > >>>> > > >> 2. IO time. The scan_blocks_time is not
>> enough.
>> >> If
>> >> > >>>> it is
>> >> > >>>> > > high,
>> >> > >>>> > > >> we know somethings wrong, but not know what cause that
>> >> problem.
>> >> > >>>> The
>> >> > >>>> > real
>> >> > >>>> > > >> IO
>> >> > >>>> > > >> time for data is not be provided. As there may be several
>> >> file
>> >> > >>>> for one
>> >> > >>>> > > >> partition, know the program slow is caused by datanode or
>> >> > >>>> executor
>> >> > >>>> > > itself
>> >> > >>>> > > >> give us intuition to find the problem.
>> >> > >>>> > > >> 3. The TableBlockInfo for task. I log it by
>> >> myself
>> >> > >>>> when
>> >> > >>>> > > >> debugging. It tell me how many blocklets is locality. The
>> >> spark
>> >> > >>>> web
>> >> > >>>> > > >> monitor
>> >> > >>>> > > >> just give a locality level, but may be only one blocklet
>> is
>> >> > >>>> locality.
>> >> > >>>> > > >>
>> >> > >>>> > > >>
>> >> > >>>> > > >> -Regards
>> >> > >>>> > > >> Kumar Vishal
>> >> > >>>> > > >>
>> >> > >>>> > > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <
>> [hidden email]
>> >> >
>> >> > >>>> wrote:
>> >> > >>>> > > >>
>> >> > >>>> > > >> > Hi,
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > We are using carbondata to build our table and running
>> >> query
>> >> > in
>> >> > >>>> > > >> > CarbonContext. We have some performance problem during
>> >> > >>>> refining the
>> >> > >>>> > > >> system.
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > *Background*：
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > *cluster*： 100 executor，5
>> >> task/executor,
>> >> > >>>> 10G
>> >> > >>>> > > >> > memory/executor
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > *data*: 60+GB(per one
>> >> replica)
>> >> > as
>> >> > >>>> > carbon
>> >> > >>>> > > >> data
>> >> > >>>> > > >> > format, 600+MB/file * 100 file, 300+columns,
>> 300+million
>> >> rows
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > *sql example:*
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > select A,
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > sum(a),
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > sum(b),
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > sum(c),
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > …( extra 100
>> >> > aggregation
>> >> > >>>> like
>> >> > >>>> > > >> > sum(column))
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > from Table1
>> LATERAL
>> >> > VIEW
>> >> > >>>> > > >> > explode(split(Aarray, ‘*;*’)) ATable AS A
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > where A is not
>> null
>> >> and
>> >> > >>>> d >
>> >> > >>>> > > >> “ab:c-10”
>> >> > >>>> > > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44
>> >> GROUP
>> >> > BY
>> >> > >>>> A
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > *target query time*: <10s
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > *current query time*: 15s ~ 25s
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > *scene:* OLAP system. <100
>> >> > queries
>> >> > >>>> every
>> >> > >>>> > > >> day.
>> >> > >>>> > > >> > Concurrency number is not high. Most time cpu is idle,
>> so
>> >> > this
>> >> > >>>> > service
>> >> > >>>> > > >> will
>> >> > >>>> > > >> > run with other program. The service will run for long
>> >> time.
>> >> > We
>> >> > >>>> could
>> >> > >>>> > > not
>> >> > >>>> > > >> > occupy a very large memory for every executor.
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > *refine*: I have build
>> index
>> >> and
>> >> > >>>> > > >> dictionary on
>> >> > >>>> > > >> > d, e, f, g, h and build dictionary on all other
>> >> aggregation
>> >> > >>>> > > >> columns(i.e. a,
>> >> > >>>> > > >> > b, c, …100+ columns). And make sure there is one
>> segment
>> >> for
>> >> > >>>> total
>> >> > >>>> > > >> data. I
>> >> > >>>> > > >> > have open the speculation(quantile=0.5, interval=250,
>> >> > >>>> > multiplier=1.2).
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > Time is mainly spent on first stage before shuffling.
>> As
>> >> 95%
>> >> > >>>> data
>> >> > >>>> > will
>> >> > >>>> > > >> be
>> >> > >>>> > > >> > filtered out, the shuffle process spend little time. In
>> >> first
>> >> > >>>> stage,
>> >> > >>>> > > >> most
>> >> > >>>> > > >> > task complete in less than 10s. But there still be
>> near 50
>> >> > >>>> tasks
>> >> > >>>> > > longer
>> >> > >>>> > > >> > than 10s. Max task time in one query may be 12~16s.
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > *Problem:*
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > 1. GC problem. We suffer a 20%~30% GC time for
>> >> some
>> >> > >>>> task in
>> >> > >>>> > > >> first
>> >> > >>>> > > >> > stage after a lot of parameter refinement. We now use
>> G1
>> >> GC
>> >> > in
>> >> > >>>> > java8.
>> >> > >>>> > > GC
>> >> > >>>> > > >> > time will double if use CMS. The main GC time is spent
>> on
>> >> > young
>> >> > >>>> > > >> generation
>> >> > >>>> > > >> > GC. Almost half memory of young generation will be
>> copy to
>> >> > old
>> >> > >>>> > > >> generation.
>> >> > >>>> > > >> > It seems lots of object has a long life than GC period
>> and
>> >> > the
>> >> > >>>> space
>> >> > >>>> > > is
>> >> > >>>> > > >> not
>> >> > >>>> > > >> > be reuse(as concurrent GC will release it later). When
>> we
>> >> > use a
>> >> > >>>> > large
>> >> > >>>> > > >> > Eden(>=1G for example), once GC time will be seconds.
>> If
>> >> set
>> >> > >>>> Eden
>> >> > >>>> > > >> > little(256M for example), once GC time will be hundreds
>> >> > >>>> > milliseconds,
>> >> > >>>> > > >> but
>> >> > >>>> > > >> > more frequency and total is still seconds. Is there any
>> >> way
>> >> > to
>> >> > >>>> > lessen
>> >> > >>>> > > >> the
>> >> > >>>> > > >> > GC time? (We don’t consider the first query and second
>> >> query
>> >> > >>>> in this
>> >> > >>>> > > >> case.)
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > 2. Performance refine problem. Row number
>> after
>> >> > being
>> >> > >>>> > > >> filtered is
>> >> > >>>> > > >> > not uniform. Some node maybe heavy. It spend more time
>> >> than
>> >> > >>>> other
>> >> > >>>> > > node.
>> >> > >>>> > > >> The
>> >> > >>>> > > >> > time of one task is 4s ~ 16s. Is any method to refine
>> it?
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > 3. Too long time for first and second query.
>> I
>> >> know
>> >> > >>>> > > dictionary
>> >> > >>>> > > >> > and some index need to be loaded for the first time.
>> But
>> >> > after
>> >> > >>>> I
>> >> > >>>> > > trying
>> >> > >>>> > > >> use
>> >> > >>>> > > >> > query below to preheat it, it still spend a lot of
>> time.
>> >> How
>> >> > >>>> could I
>> >> > >>>> > > >> > preheat the query correctly?
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > select Aarray, a, b, c… from Table1 where
>> >> > Aarray
>> >> > >>>> is
>> >> > >>>> > not
>> >> > >>>> > > >> null
>> >> > >>>> > > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h =
>> 55
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > 4. Any other suggestion to lessen the query
>> >> time?
>> >> > >>>> > > >> >
>> >> > >>>> > > >> >
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > Some suggestion：
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > The log by class QueryStatisticsRecorder
>> give
>> >> me
>> >> > a
>> >> > >>>> good
>> >> > >>>> > > >> means
>> >> > >>>> > > >> > to find the neck bottle, but not enough. There still
>> some
>> >> > >>>> metric I
>> >> > >>>> > > >> think is
>> >> > >>>> > > >> > very useful:
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > 1. filter ratio. i.e.. not only result_size
>> >> but
>> >> > >>>> also the
>> >> > >>>> > > >> origin
>> >> > >>>> > > >> > size so we could know how many data is filtered.
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > 2. IO time. The scan_blocks_time is not
>> >> enough.
>> >> > If
>> >> > >>>> it is
>> >> > >>>> > > >> high,
>> >> > >>>> > > >> > we know somethings wrong, but not know what cause that
>> >> > >>>> problem. The
>> >> > >>>> > > >> real IO
>> >> > >>>> > > >> > time for data is not be provided. As there may be
>> several
>> >> > file
>> >> > >>>> for
>> >> > >>>> > one
>> >> > >>>> > > >> > partition, know the program slow is caused by datanode
>> or
>> >> > >>>> executor
>> >> > >>>> > > >> itself
>> >> > >>>> > > >> > give us intuition to find the problem.
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > 3. The TableBlockInfo for task. I log it by
>> >> > myself
>> >> > >>>> when
>> >> > >>>> > > >> > debugging. It tell me how many blocklets is locality.
>> The
>> >> > >>>> spark web
>> >> > >>>> > > >> monitor
>> >> > >>>> > > >> > just give a locality level, but may be only one
>> blocklet
>> >> is
>> >> > >>>> > locality.
>> >> > >>>> > > >> >
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > ---------------------
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > Anning Luo
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > *HULU*
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > Email: [hidden email]
>> >> > >>>> > > >> >
>> >> > >>>> > > >> > [hidden email]
>> >> > >>>> > > >> >
>> >> > >>>> > > >>
>> >> > >>>> > > >
>> >> > >>>> > > >
>> >> > >>>> > >
>> >> > >>>> >
>> >> > >>>>
>> >> > >>>
>> >> > >>>
>> >> > >>
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>

... [show rest of quote]

driverlog3.txt
<https://drive.google.com/file/d/0B1XM6KeI1nB7QmZXd2d3ZlR2bnM/view?usp=drive_web>

driverlog2.txt
<https://drive.google.com/file/d/0B1XM6KeI1nB7UF92RWVOWUdwZGM/view?usp=drive_web>

driverlog1.txt
<https://drive.google.com/file/d/0B1XM6KeI1nB7WjhuNjZHZWZJZGM/view?usp=drive_web>

Anning Luo-2

Nov 17, 2016; 10:25am

Re: GC problem and performance refine problem

14 posts

Hi Kumar Vishal,

The accessory for the last email is on the bottom, Pasted as url. The log
is too big to send. I put it on good driver and paste the link.

2016-11-17 16:59 GMT+08:00 An Lan <[hidden email]>:

> Hi Kumar Vishal,
>
>
>
> I redo some experiment with a detail driver log. The logs are in the
> accessory. And every log run the same query for 10 times.
>
>
>
> 1. “driverlog1” is under same condition as previous I did.
>
> 2. “driverlog2”: in this experiment, before load data, I shuffle
> the csv file to make the rows with same attribute will distribute uniform
> between all csv files. i.e. if we use the query filter on the origin csv
> file, the result quantity is almost same for every csv file. And every csv
> file is one block. So every partition for the loader of carbondata has
> uniform data. That method works well:
>
> Time cost of first experiment: 103543, 49363, 125430, 22138,
> 21626, 27905, 22086, 21332, 23501, 16715
>
> Time cost of second experiment:62466, 38709, 24700, 18767, 18325, 15887,
> 16014, 15452, 17440, 15563
>
> 3. “driverlog3”: First, use the shuffled csv data as 2th
> experiment. Second, modify the create table sql:
>
>
>
> *OLD CREATE TABLE:*
>
> CREATE TABLE IF NOT EXISTS Table1
>
> (* h Int, g Int, d String, f Int, e Int,*
>
> a Int, b Int, …(extra near 300 columns)
>
> STORED BY 'org.apache.carbondata.format'TBLPROPERTIES(
>
> "NO_INVERTED_INDEX”=“a”,"NO_INVERTED_INDEX”=“b”,
>
> …(extra near 300 columns)
>
> "DICTIONARY_INCLUDE”=“a”,
>
> "DICTIONARY_INCLUDE”=“b”,
>
> …(extra near 300 columns)
>
> )
>
>
>
> *NEW CREATE TABLE:*
>
> CREATE TABLE IF NOT EXISTS Table1
>
> (* h Int, g Int, f Int, e Int, d String*
>
> a Int, b Int, …(extra near 300 columns)
>
> STORED BY 'org.apache.carbondata.format' TBLPROPERTIES(
>
> DICTIONARY_INCLUDE"="h, g, f , e , d"
>
> )
>
> The h,g,f,e,d are the columns used in filter. And their distinct value
> increase.
>
> Time cost of third experiment: 57088, 37105, 41588, 43540, 38143, 35581,
> 34040, 35886, 34610, 32589
>
> The time cost of third experiment is longer than others. It is because
> that I use the “h=66 and g = 67 and f = 12 and e != 5 and d >’aaaa’ and d <
> ‘bbbb’” as the filter condition. So considering with the mdk build order,
> the result rows will be continuous in the file. So there is some blocket
> matching the filter condition totally, while others matching zero row. But
> the scheduler on driver did not filter out the zero row matched blockets.
> It seems the min/max filter does not work correctly on driver side when
> getSplit from InputFormat. There are some case on executor(blocket size is
> 120k):
>
> +------------------+----------------+--------------------+--
> --------------+---------------+-----------+-------------------+
>
> | task_id|load_blocks_time|load_dictionary_time|scan_blocks_ti
> me|scan_blocks_num|result_size|total_executor_time|
>
> +------------------+----------------+--------------------+--
> --------------+---------------+-----------+-------------------+
>
> |603288369228186_62| 426 | 2 | 28
> | 3 | 0 | 470 |
>
> +------------------+----------------+--------------------+--
> --------------+---------------+-----------+-------------------+
>
>
>
> 16/11/16 21:36:25 INFO impl.FilterScanner: pool-30-thread-1 [INDEX] index
> filter : 113609/120000
>
> +-------------------+----------------+--------------------+-
> ---------------+---------------+-----------+-------------------+
>
> | task_id|load_blocks_time|load_dictionary_time|scan_blocks_ti
> me|scan_blocks_num|result_size|total_executor_time|
>
> +-------------------+----------------+--------------------+-
> ---------------+---------------+-----------+-------------------+
>
> |603288369228186_428| 380 | 4 |
> 661 | 3 | 113609 | 13634 |
>
> +-------------------+----------------+--------------------+-
> ---------------+---------------+-----------+-------------------+
>
> In first case result_size is zero, but the blocket is still put in one
> task. That make some task do nothing, but others suffer a heavy work.
>
> In second case, example only one blocket has data(4 or 6 blockets for one
> task). I add the log about invert index filter ratio like “[INDEX] index
> filter : <result count>/<total block count>”
>
> So, how could I make the min/max filter work correctly in driver side?
>
>
>
> Another question: use multi-line in TBLPROPERTIES dose not work correctly
> like “"DICTIONARY_INCLUDE”=“a”, "DICTIONARY_INCLUDE”=“b”” in the old
> create table sql. Only the column in the last DICTIONARY_INCLUDE declare is
> add into the dimension. The new one works correctly. But the old way did
> not throw any exception.
>
>
>
> Further, I think from the 2th experiment balancing the data is important.
> So I will change the blocket size to 2k rows if the min/max filter could
> work on driver side. I have not changed the int type to double type for
> measure, I will did it later.
>
> 2016-11-17 16:34 GMT+08:00 An Lan <[hidden email]>:
>
>> Hi Kumar Vishal,
>>
>>
>>
>> I redo some experiment with a detail driver log. The logs are in the
>> accessory. And every log run the same query for 10 times.
>>
>>
>>
>> 1. “driverlog1” is under same condition as previous I did.
>>
>> 2. “driverlog2”: in this experiment, before load data, I shuffle
>> the csv file to make the rows with same attribute will distribute uniform
>> between all csv files. i.e. if we use the query filter on the origin csv
>> file, the result quantity is almost same for every csv file. And every csv
>> file is one block. So every partition for the loader of carbondata has
>> uniform data. That method works well:
>>
>> Time cost of first experiment: 103543, 49363, 125430, 22138,
>> 21626, 27905, 22086, 21332, 23501, 16715
>>
>> Time cost of second experiment:62466, 38709, 24700, 18767, 18325, 15887,
>> 16014, 15452, 17440, 15563
>>
>> 3. “driverlog3”: First, use the shuffled csv data as 2th
>> experiment. Second, modify the create table sql:
>>
>>
>>
>> *OLD CREATE TABLE:*
>>
>> CREATE TABLE IF NOT EXISTS Table1
>>
>> (* h Int, g Int, d String, f Int, e Int,*
>>
>> a Int, b Int, …(extra near 300 columns)
>>
>> STORED BY 'org.apache.carbondata.format'TBLPROPERTIES(
>>
>> "NO_INVERTED_INDEX”=“a”,"NO_INVERTED_INDEX”=“b”,
>>
>> …(extra near 300 columns)
>>
>> "DICTIONARY_INCLUDE”=“a”,
>>
>> "DICTIONARY_INCLUDE”=“b”,
>>
>> …(extra near 300 columns)
>>
>> )
>>
>>
>>
>> *NEW CREATE TABLE:*
>>
>> CREATE TABLE IF NOT EXISTS Table1
>>
>> (* h Int, g Int, f Int, e Int, d String*
>>
>> a Int, b Int, …(extra near 300 columns)
>>
>> STORED BY 'org.apache.carbondata.format' TBLPROPERTIES(
>>
>> DICTIONARY_INCLUDE"="h, g, f , e , d"
>>
>> )
>>
>> The h,g,f,e,d are the columns used in filter. And their distinct value
>> increase.
>>
>> Time cost of third experiment: 57088, 37105, 41588, 43540, 38143, 35581,
>> 34040, 35886, 34610, 32589
>>
>> The time cost of third experiment is longer than others. It is because
>> that I use the “h=66 and g = 67 and f = 12 and e != 5 and d >’aaaa’ and d <
>> ‘bbbb’” as the filter condition. So considering with the mdk build order,
>> the result rows will be continuous in the file. So there is some blocket
>> matching the filter condition totally, while others matching zero row. But
>> the scheduler on driver did not filter out the zero row matched blockets.
>> It seems the min/max filter does not work correctly on driver side when
>> getSplit from InputFormat. There are some case on executor(blocket size is
>> 120k):
>>
>> +------------------+----------------+--------------------+--
>> --------------+---------------+-----------+-------------------+
>>
>> | task_id|load_blocks_time|load_dictionary_time|scan_blocks_ti
>> me|scan_blocks_num|result_size|total_executor_time|
>>
>> +------------------+----------------+--------------------+--
>> --------------+---------------+-----------+-------------------+
>>
>> |603288369228186_62| 426 | 2 |
>> 28 | 3 | 0 | 470 |
>>
>> +------------------+----------------+--------------------+--
>> --------------+---------------+-----------+-------------------+
>>
>>
>>
>> 16/11/16 21:36:25 INFO impl.FilterScanner: pool-30-thread-1 [INDEX] index
>> filter : 113609/120000
>>
>> +-------------------+----------------+--------------------+-
>> ---------------+---------------+-----------+-------------------+
>>
>> | task_id|load_blocks_time|load_dictionary_time|scan_blocks_ti
>> me|scan_blocks_num|result_size|total_executor_time|
>>
>> +-------------------+----------------+--------------------+-
>> ---------------+---------------+-----------+-------------------+
>>
>> |603288369228186_428| 380 | 4 |
>> 661 | 3 | 113609 | 13634 |
>>
>> +-------------------+----------------+--------------------+-
>> ---------------+---------------+-----------+-------------------+
>>
>> In first case result_size is zero, but the blocket is still put in one
>> task. That make some task do nothing, but others suffer a heavy work.
>>
>> In second case, example only one blocket has data(4 or 6 blockets for one
>> task). I add the log about invert index filter ratio like “[INDEX] index
>> filter : <result count>/<total block count>”
>>
>> So, how could I make the min/max filter work correctly in driver side?
>>
>>
>>
>> Another question: use multi-line in TBLPROPERTIES dose not work correctly
>> like “"DICTIONARY_INCLUDE”=“a”, "DICTIONARY_INCLUDE”=“b”” in the old
>> create table sql. Only the column in the last DICTIONARY_INCLUDE declare is
>> add into the dimension. The new one works correctly. But the old way did
>> not throw any exception.
>>
>>
>>
>> Further, I think from the 2th experiment balancing the data is important.
>> So I will change the blocket size to 2k rows if the min/max filter could
>> work on driver side. I have not changed the int type to double type for
>> measure, I will did it later.
>>
>> 2016-11-16 21:55 GMT+08:00 Kumar Vishal <[hidden email]>:
>>
>>> Hi An Lan,
>>> Please go through discussion *how to create carbon table for better query
>>> performance* shared by Bill. I am coping the same in this mail for you
>>> reference.
>>>
>>>
>>> Discussion how to crate the CarbonData table with good performance
>>> Suggestion to create Carbon table
>>> Recently we used CarbonData to do the performance in Telecommunication
>>> filed
>>> and summarize some of the Suggestions while creating the CarbonData
>>> table.We
>>> have tables which range from 10 thousand rows to 10 billion rows and have
>>> from 100 columns to 300 columns. Following are some of the columns used
>>> in
>>> the table.
>>> Column name Data type Cardinality Attribution
>>> msisdn String 30 million dimension
>>> BEGIN_TIME bigint 10 thousand dimension
>>> HOST String 1 million dimension
>>> Dime_1 String 1 thousand dimension
>>> counter_1 numeric(20,0) NA measure
>>> ... ... NA ...
>>> counter_100 numeric(20,0) NA measure
>>> We have about more than 50 test cases; according to the test case we
>>> summarize some suggestion to create the table which can have a better
>>> query
>>> performance. 1. Put the frequently-used column filter in the beginning.
>>> For
>>> example, MSISDN filter is used in most of the query then put the MSISDN
>>> in
>>> the first column. The create table command can be as follows, the query
>>> which has MSISDN as a filter will be good (because the MSISDN is high
>>> cardinality, if create table like this the compress ratio will be
>>> decreased)/create table carbondata_table(//msisdn String,//...//)STORED
>>> BY
>>> 'org.apache.carbondata.format' //TBLPROPERTIES (
>>> 'DICTIONARY_EXCLUDE'='MSISDN,..','DICTIONARY_INCLUDE'='...');/2.
>>> If
>>> has
>>> multiple column which is frequently-use in the filter, put it to the
>>> front
>>> in the order as low cardinality to high cardinality.For example if
>>> msisdn,
>>> host and dime_1 is frequently-used column, the table column order can be
>>> like dime_1->host->msisdn, because the dime_1 cardinality is low. Create
>>> table command can be as follows. This will increase the compression ratio
>>> and good performance for filter on dime_1, host and msisdn. /create table
>>> carbondata_table(Dime_1 String,HOST String,MSISDN String,...)STORED BY
>>> 'org.apache.carbondata.format' /TBLPROPERTIES (
>>> 'DICTIONARY_EXCLUDE'='MSISDN,HOST..','DICTIONARY_INCLUDE'='Dime_1..');3.
>>> If
>>> no column is frequent-use in filter, then can put all the dimension
>>> column
>>> order as from low cardinality to high cardinality. Create table command
>>> can
>>> be as following: /create table carbondata_table(Dime_1 String,BEGIN_TIME
>>> bigintHOST String,MSISDN String,...)STORED BY
>>> 'org.apache.carbondata.format'
>>> TBLPROPERTIES (
>>> 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI..','DICTIONARY_
>>> INCLUDE'='Dime_1,END_TIME,BEGIN_TIME..');/4.
>>> For measure that needs no high accuracy, then no need to use
>>> numeric(20,0)
>>> data type, suggestion is to use double to replace it than will increase
>>> the
>>> query performance. If one test case uses double to replace the numeric
>>> (20,
>>> 0) the query improve 5 times from 15 second to 3 second. Create table
>>> command can be as follows. /create table carbondata_table(Dime_1
>>> String,BEGIN_TIME bigintHOST String,MSISDN String,counter_1
>>> double,counter_2
>>> double,...counter_100 double,)STORED BY 'org.apache.carbondata.format'
>>> TBLPROPERTIES (
>>> 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI','DICTIONARY_
>>> INCLUDE'='Dime_1,END_TIME,BEGIN_TIME');/5.
>>> If the column which is always incremental like start_time. For example
>>> one
>>> scenario: every day we will load data into carbon and the start_time is
>>> incremental for each load. For this scenario you can put the start_time
>>> column in the back of dimension, because always incremental value can use
>>> the min/max index well always. Create table command can be as following.
>>> /create table carbondata_table(Dime_1 String,HOST String,MSISDN
>>> String,counter_1 double,counter_2 double,BEGIN_TIME bigint,...counter_100
>>> double,)STORED BY 'org.apache.carbondata.format' TBLPROPERTIES (
>>> 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI','DICTIONARY_
>>> INCLUDE'='Dime_1,END_TIME,BEGIN_TIME');/One
>>> more is for the dimension whether dictionary is needed or not, we
>>> suggest if
>>> the cardinality higher than 50 thousand do not put it as dictionary
>>> column.
>>> If high cardinality column put as dictionary will impact the load
>>> performance.
>>>
>>> -Regards
>>> Kumar Vishal
>>>
>>> On Wed, Nov 16, 2016 at 7:19 PM, Kumar Vishal <[hidden email]
>>> >
>>> wrote:
>>>
>>> > Hi An Lan,
>>> > For detail logging you need to add log4j configuration in
>>> > spark-default.conf for both driver and executor.
>>> > * spark.driver.extraJavaOption = -Dlog4j.configuration=file:/<f
>>> ilepath>*
>>> > * spark.executor.extraJavaOption = -Dlog4j.configuration=file:/<f
>>> ilepath>*
>>> >
>>> > Please make sure in* log4j.properties* *log4j.rootCategory* is *INFO*
>>> >
>>> > -Regards
>>> > Kumar Vishal
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Wed, Nov 16, 2016 at 2:06 PM, An Lan <[hidden email]> wrote:
>>> >
>>> >> Hi Kumar Vishal,
>>> >>
>>> >> Thanks for your suggestion.
>>> >> The driver log not contain block distribution log by default. How
>>> could I
>>> >> open it?
>>> >> And how does the order of the dimensions be decided?
>>> >>
>>> >> 2016-11-16 15:14 GMT+08:00 Kumar Vishal <[hidden email]>:
>>> >>
>>> >> > Hi An Lan,
>>> >> > Data is already distributed, in this case may be one
>>> >> > blocklet is returning more number of rows and other returning less
>>> >> because
>>> >> > of this some task will take more time.
>>> >> >
>>> >> > In driver log block distribution log is not present, so it is not
>>> clear
>>> >> > whether it is going for block distribution or blocklet distribution.
>>> >> > distribution, can you please add the detail driver log.
>>> >> >
>>> >> > *Some suggestion:*
>>> >> >
>>> >> > 1. If column is String and you are not applying filter then in
>>> create
>>> >> > statement add it in dictionary exclude this will avoid lookup in
>>> >> > dictionary. If number of String column are less then better add it
>>> in
>>> >> > dictionary exclude.
>>> >> >
>>> >> > 2. If column is a numeric column and you are not applying filter
>>> then no
>>> >> > need to add in dictionary include or exclude, so it will be a
>>> measure
>>> >> > column.
>>> >> >
>>> >> > 3. Currently in carbon for measure column if you will create as a
>>> double
>>> >> > data type it will give more compression as currently value
>>> compression
>>> >> mode
>>> >> > is support for double data type.
>>> >> >
>>> >> > -Regards
>>> >> > Kumar Vishal
>>> >> >
>>> >> >
>>> >> > On Wed, Nov 16, 2016 at 8:12 AM, An Lan <[hidden email]> wrote:
>>> >> >
>>> >> > > Hi Kumar Vishal,
>>> >> > >
>>> >> > > 1. I found the quantity of rows filtered out by invert index is
>>> not
>>> >> > > uniform between different tasks and the difference is large. Some
>>> task
>>> >> > may
>>> >> > > be 3~4k row after filtered, but the longer tasks may be 3~4w. When
>>> >> most
>>> >> > > longer task on same node, time cost will be more longer than
>>> others.
>>> >> So,
>>> >> > is
>>> >> > > there any way to balance the data and make rows distribute uniform
>>> >> > between
>>> >> > > task?
>>> >> > > Now, the blocklet size is still 120k rows. 3k+ blocket in total.
>>> Every
>>> >> > > task has 6 blocket.
>>> >> > > 2. I upload some logs last time. Is there some access problem
>>> about
>>> >> it?
>>> >> > Is
>>> >> > > there any suggest or question about it?
>>> >> > >
>>> >> > > 2016-11-14 11:29 GMT+08:00 An Lan <[hidden email]>:
>>> >> > >
>>> >> > >> Hi Kumar Vishal,
>>> >> > >>
>>> >> > >>
>>> >> > >> Driver and some executor logs are in the accessory. The same
>>> query
>>> >> run
>>> >> > >> for five times.
>>> >> > >>
>>> >> > >>
>>> >> > >>
>>> >> > >> Time consume for every query:
>>> >> > >>
>>> >> > >> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
>>> >> > >>
>>> >> > >>
>>> >> > >> The first stage for every query:
>>> >> > >>
>>> >> > >> (first query -> start from 0 stage
>>> >> > >>
>>> >> > >> second query -> 4
>>> >> > >>
>>> >> > >> third query -> 8
>>> >> > >>
>>> >> > >> fourth query -> 12
>>> >> > >>
>>> >> > >> five query -> 16)
>>> >> > >>
>>> >> > >> see "first stage of queries.png" in accessory
>>> >> > >>
>>> >> > >>
>>> >> > >>
>>> >> > >> Longest task for every query:
>>> >> > >>
>>> >> > >> (Task and log relationship:
>>> >> > >>
>>> >> > >> stage 0, task 249 -> 1.log
>>> >> > >>
>>> >> > >> stage 4, task 420 -> 2.log
>>> >> > >>
>>> >> > >> stage 8, task 195 -> 3.log
>>> >> > >>
>>> >> > >> stage 12, task 350 -> 4.log
>>> >> > >>
>>> >> > >> stage 16, task 321 -> 5.log)
>>> >> > >> see "longest task.png" in accessory
>>> >> > >>
>>> >> > >>
>>> >> > >> longest tasks.png
>>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7dGY4cF9jRDk1QTg/
>>> >> > view?usp=drive_web>
>>> >> > >>
>>> >> > >> first stage of queries.png
>>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7SVZsNnR2VEw2X0k/
>>> >> > view?usp=drive_web>
>>> >> > >>
>>> >> > >> dirverlog
>>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7VllnVkNnenhyTTA/
>>> >> > view?usp=drive_web>
>>> >> > >>
>>> >> > >> 1.log
>>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7eUdUWmlia1J4bk0/
>>> >> > view?usp=drive_web>
>>> >> > >>
>>> >> > >> 2.log
>>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ZUlGZHZVQ3phdmM/
>>> >> > view?usp=drive_web>
>>> >> > >>
>>> >> > >> 3.log
>>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7UHpBQzREX3N5aEk/
>>> >> > view?usp=drive_web>
>>> >> > >>
>>> >> > >> 4.log
>>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7NVctMmYwNldCVEk/
>>> >> > view?usp=drive_web>
>>> >> > >>
>>> >> > >> 5.log
>>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ODhHNE5sSGNfOVE/
>>> >> > view?usp=drive_web>
>>> >> > >>
>>> >> > >>
>>> >> > >> 2016-11-14 11:22 GMT+08:00 An Lan <[hidden email]>:
>>> >> > >>
>>> >> > >>> Hi,
>>> >> > >>>
>>> >> > >>>
>>> >> > >>>
>>> >> > >>> Driver and some executor logs are in the accessory. The same
>>> query
>>> >> run
>>> >> > >>> for five times.
>>> >> > >>>
>>> >> > >>>
>>> >> > >>>
>>> >> > >>> Time consume for every query:
>>> >> > >>>
>>> >> > >>> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
>>> >> > >>>
>>> >> > >>>
>>> >> > >>>
>>> >> > >>> The first stage for every query:
>>> >> > >>>
>>> >> > >>> (first query -> start from 0 stage
>>> >> > >>>
>>> >> > >>> second query -> 4
>>> >> > >>>
>>> >> > >>> third query -> 8
>>> >> > >>>
>>> >> > >>> fourth query -> 12
>>> >> > >>>
>>> >> > >>> five query -> 16)
>>> >> > >>>
>>> >> > >>>
>>> >> > >>> Longest task for every query:
>>> >> > >>>
>>> >> > >>> (Task and log relationship:
>>> >> > >>>
>>> >> > >>> stage 0, task 249 -> 1.log
>>> >> > >>>
>>> >> > >>> stage 4, task 420 -> 2.log
>>> >> > >>>
>>> >> > >>> stage 8, task 195 -> 3.log
>>> >> > >>>
>>> >> > >>> stage 12, task 350 -> 4.log
>>> >> > >>>
>>> >> > >>> stage 16, task 321 -> 5.log)
>>> >> > >>>
>>> >> > >>>
>>> >> > >>>
>>> >> > >>> 2016-11-11 20:25 GMT+08:00 Kumar Vishal <
>>> [hidden email]
>>> >> >:
>>> >> > >>>
>>> >> > >>>> What is the time difference between first time and second time
>>> >> query
>>> >> > as
>>> >> > >>>> second time it will read from os cache,so i think there wont be
>>> >> any IO
>>> >> > >>>> bottleneck.
>>> >> > >>>>
>>> >> > >>>> Can u provide driver log and executor log for task which are
>>> taking
>>> >> > more
>>> >> > >>>> time.
>>> >> > >>>>
>>> >> > >>>>
>>> >> > >>>> -Regards
>>> >> > >>>> Kumar Vishal
>>> >> > >>>>
>>> >> > >>>> On Fri, Nov 11, 2016 at 3:21 PM, An Lan <[hidden email]>
>>> >> wrote:
>>> >> > >>>>
>>> >> > >>>> > 1. I have set --num-executors=100 in all test. The image
>>> write
>>> >> 100
>>> >> > >>>> nodes
>>> >> > >>>> > means one executor for one node, and when there is 64 node,
>>> there
>>> >> > >>>> maybe two
>>> >> > >>>> > or three executor on one node. The total executor number is
>>> >> always
>>> >> > >>>> 100.
>>> >> > >>>> >
>>> >> > >>>> > 2. I do not find the property enable.blocklet.distribution in
>>> >> > 0.1.1. I
>>> >> > >>>> > found it in 0.2.0. I am testing on 0.1.1, and will try it
>>> later.
>>> >> If
>>> >> > >>>> you
>>> >> > >>>> > concern locality level, it seems that the most long query is
>>> not
>>> >> > >>>> caused by
>>> >> > >>>> > IO problem. Statistic of the long task statistic like this:
>>> >> > >>>> >
>>> >> > >>>> > +--------------------+----------------+--------------------+
>>> >> > >>>> > ----------------+---------------+-----------+---------------
>>> >> ----+
>>> >> > >>>> > | task_id|load_blocks_time|load
>>> >> > >>>> _dictionary_time|scan_blocks_
>>> >> > >>>> > time|scan_blocks_num|result_size|total_executor_time|
>>> >> > >>>> > +--------------------+----------------+--------------------+
>>> >> > >>>> > ----------------+---------------+-----------+---------------
>>> >> ----+
>>> >> > >>>> > |4199754456147478_134| 1 | 32 |
>>> >> > >>>> > 3306 | 3 | 42104 | 13029 |
>>> >> > >>>> > +--------------------+----------------+--------------------+
>>> >> > >>>> > ----------------+---------------+-----------+---------------
>>> >> ----+
>>> >> > >>>> >
>>> >> > >>>> > I have suffer from IO, but after configure speculation, most
>>> >> tasks
>>> >> > >>>> > with a long IO will be resend.
>>> >> > >>>> >
>>> >> > >>>> >
>>> >> > >>>> > 3. distinct value:
>>> >> > >>>> >
>>> >> > >>>> > g: 11
>>> >> > >>>> >
>>> >> > >>>> > h: 3
>>> >> > >>>> >
>>> >> > >>>> > f: 7
>>> >> > >>>> >
>>> >> > >>>> > e: 3
>>> >> > >>>> >
>>> >> > >>>> > d: 281
>>> >> > >>>> >
>>> >> > >>>> > 4. In the query, only e, f ,g, h, d and A are used in filter,
>>> >> others
>>> >> > >>>> are
>>> >> > >>>> > not used. So I think others used in the aggregation are no
>>> need
>>> >> > added
>>> >> > >>>> for
>>> >> > >>>> > index.
>>> >> > >>>> >
>>> >> > >>>> > 5. I have write the e, f, g, h, d in most left just after
>>> "create
>>> >> > >>>> table..."
>>> >> > >>>> >
>>> >> > >>>> > 2016-11-11 15:08 GMT+08:00 Kumar Vishal <
>>> >> [hidden email]
>>> >> > >:
>>> >> > >>>> >
>>> >> > >>>> > > Hi An Lan,
>>> >> > >>>> > >
>>> >> > >>>> > > Please confirm below things.
>>> >> > >>>> > >
>>> >> > >>>> > > 1. Is dynamic executor is enabled?? If it is enabled can u
>>> >> > disabled
>>> >> > >>>> and
>>> >> > >>>> > > try(this is to check is there any impact with dynamic
>>> executor)
>>> >> > >>>> > > for disabling dynamic executor you need set in
>>> >> > >>>> spark-default.conf
>>> >> > >>>> > > --num-executors=100
>>> >> > >>>> > >
>>> >> > >>>> > > 2. Can u please set in below property
>>> >> > enable.blocklet.distribution=f
>>> >> > >>>> alse
>>> >> > >>>> > > and execute the query.
>>> >> > >>>> > >
>>> >> > >>>> > > 3. Cardinality of each column.
>>> >> > >>>> > >
>>> >> > >>>> > > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a”
>>> ??
>>> >> > >>>> > >
>>> >> > >>>> > > 5. Can u keep all the columns which is present in filter on
>>> >> left
>>> >> > >>>> side so
>>> >> > >>>> > > less no of blocks will identified during block pruning and
>>> it
>>> >> will
>>> >> > >>>> > improve
>>> >> > >>>> > > the query performance.
>>> >> > >>>> > >
>>> >> > >>>> > >
>>> >> > >>>> > > -Regards
>>> >> > >>>> > > Kumar Vishal
>>> >> > >>>> > >
>>> >> > >>>> > > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <
>>> [hidden email]>
>>> >> > >>>> wrote:
>>> >> > >>>> > >
>>> >> > >>>> > > > Hi Kumar Vishal,
>>> >> > >>>> > > >
>>> >> > >>>> > > > 1. Create table ddl:
>>> >> > >>>> > > >
>>> >> > >>>> > > > CREATE TABLE IF NOT EXISTS Table1
>>> >> > >>>> > > >
>>> >> > >>>> > > > (* h Int, g Int, d String, f Int, e Int,*
>>> >> > >>>> > > >
>>> >> > >>>> > > > a Int, b Int, …(extra near 300 columns)
>>> >> > >>>> > > >
>>> >> > >>>> > > > STORED BY 'org.apache.carbondata.format'
>>> >> > >>>> > > > TBLPROPERTIES(
>>> >> > >>>> > > >
>>> >> > >>>> > > > "NO_INVERTED_INDEX”=“a”,
>>> >> > >>>> > > > "NO_INVERTED_INDEX”=“b”,
>>> >> > >>>> > > >
>>> >> > >>>> > > > …(extra near 300 columns)
>>> >> > >>>> > > >
>>> >> > >>>> > > > "DICTIONARY_INCLUDE”=“a”,
>>> >> > >>>> > > >
>>> >> > >>>> > > > "DICTIONARY_INCLUDE”=“b”,
>>> >> > >>>> > > >
>>> >> > >>>> > > > …(extra near 300 columns)
>>> >> > >>>> > > >
>>> >> > >>>> > > > )
>>> >> > >>>> > > >
>>> >> > >>>> > > > 2. 3. There more than hundreds node in the
>>> cluster, but
>>> >> > >>>> cluster
>>> >> > >>>> > is
>>> >> > >>>> > > > used mixed with other application. Some time when node
>>> is
>>> >> > >>>> enough, we
>>> >> > >>>> > > will
>>> >> > >>>> > > > get 100 distinct node.
>>> >> > >>>> > > >
>>> >> > >>>> > > > 4. I give a statistic of task time during once
>>> query and
>>> >> > mark
>>> >> > >>>> > > > distinct nodes below:
>>> >> > >>>> > > >
>>> >> > >>>> > > > [image: 内嵌图片 1]
>>> >> > >>>> > > >
>>> >> > >>>> > > >
>>> >> > >>>> > > >
>>> >> > >>>> > > >
>>> >> > >>>> > > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <
>>> >> > >>>> [hidden email]>:
>>> >> > >>>> > > >
>>> >> > >>>> > > >> Hi Anning Luo,
>>> >> > >>>> > > >>
>>> >> > >>>> > > >> Can u please provide below details.
>>> >> > >>>> > > >>
>>> >> > >>>> > > >> 1.Create table ddl.
>>> >> > >>>> > > >> 2.Number of node in you cluster setup.
>>> >> > >>>> > > >> 3. Number of executors per node.
>>> >> > >>>> > > >> 4. Query statistics.
>>> >> > >>>> > > >>
>>> >> > >>>> > > >> Please find my comments in bold.
>>> >> > >>>> > > >>
>>> >> > >>>> > > >> Problem:
>>> >> > >>>> > > >> 1. GC problem. We suffer a
>>> 20%~30%
>>> >> GC
>>> >> > >>>> time
>>> >> > >>>> > for
>>> >> > >>>> > > >> some task in first stage after a lot of parameter
>>> >> refinement.
>>> >> > We
>>> >> > >>>> now
>>> >> > >>>> > use
>>> >> > >>>> > > >> G1
>>> >> > >>>> > > >> GC in java8. GC time will double if use CMS. The main GC
>>> >> time
>>> >> > is
>>> >> > >>>> spent
>>> >> > >>>> > > on
>>> >> > >>>> > > >> young generation GC. Almost half memory of young
>>> generation
>>> >> > will
>>> >> > >>>> be
>>> >> > >>>> > copy
>>> >> > >>>> > > >> to
>>> >> > >>>> > > >> old generation. It seems lots of object has a long life
>>> >> than GC
>>> >> > >>>> period
>>> >> > >>>> > > and
>>> >> > >>>> > > >> the space is not be reuse(as concurrent GC will release
>>> it
>>> >> > >>>> later).
>>> >> > >>>> > When
>>> >> > >>>> > > we
>>> >> > >>>> > > >> use a large Eden(>=1G for example), once GC time will be
>>> >> > >>>> seconds. If
>>> >> > >>>> > set
>>> >> > >>>> > > >> Eden little(256M for example), once GC time will be
>>> hundreds
>>> >> > >>>> > > milliseconds,
>>> >> > >>>> > > >> but more frequency and total is still seconds. Is there
>>> any
>>> >> way
>>> >> > >>>> to
>>> >> > >>>> > > lessen
>>> >> > >>>> > > >> the GC time? (We don’t consider the first query and
>>> second
>>> >> > query
>>> >> > >>>> in
>>> >> > >>>> > this
>>> >> > >>>> > > >> case.)
>>> >> > >>>> > > >>
>>> >> > >>>> > > >> *How many node are present in your cluster setup?? If
>>> nodes
>>> >> are
>>> >> > >>>> less
>>> >> > >>>> > > >> please
>>> >> > >>>> > > >> reduce the number of executors per node.*
>>> >> > >>>> > > >>
>>> >> > >>>> > > >> 2. Performance refine problem.
>>> Row
>>> >> > >>>> number
>>> >> > >>>> > after
>>> >> > >>>> > > >> being filtered is not uniform. Some node maybe heavy. It
>>> >> spend
>>> >> > >>>> more
>>> >> > >>>> > time
>>> >> > >>>> > > >> than other node. The time of one task is 4s ~ 16s. Is
>>> any
>>> >> > method
>>> >> > >>>> to
>>> >> > >>>> > > refine
>>> >> > >>>> > > >> it?
>>> >> > >>>> > > >>
>>> >> > >>>> > > >> 3. Too long time for first and
>>> >> second
>>> >> > >>>> query.
>>> >> > >>>> > I
>>> >> > >>>> > > >> know dictionary and some index need to be loaded for the
>>> >> first
>>> >> > >>>> time.
>>> >> > >>>> > But
>>> >> > >>>> > > >> after I trying use query below to preheat it, it still
>>> >> spend a
>>> >> > >>>> lot of
>>> >> > >>>> > > >> time.
>>> >> > >>>> > > >> How could I preheat the query correctly?
>>> >> > >>>> > > >> select Aarray, a, b, c… from
>>> Table1
>>> >> > where
>>> >> > >>>> > Aarray
>>> >> > >>>> > > >> is
>>> >> > >>>> > > >> not null and d = “sss” and e !=22 and f = 33 and g = 44
>>> and
>>> >> h =
>>> >> > >>>> 55
>>> >> > >>>> > > >>
>>> >> > >>>> > > >> *Currently we are working on first time query
>>> improvement.
>>> >> For
>>> >> > >>>> now you
>>> >> > >>>> > > can
>>> >> > >>>> > > >> run select count(*) or count(column), so all the blocks
>>> get
>>> >> > >>>> loaded and
>>> >> > >>>> > > >> then
>>> >> > >>>> > > >> you can run the actual query.*
>>> >> > >>>> > > >>
>>> >> > >>>> > > >>
>>> >> > >>>> > > >> 4. Any other suggestion to lessen the query
>>> >> time?
>>> >> > >>>> > > >>
>>> >> > >>>> > > >>
>>> >> > >>>> > > >> Some suggestion：
>>> >> > >>>> > > >> The log by class QueryStatisticsRecorder
>>> give
>>> >> me a
>>> >> > >>>> good
>>> >> > >>>> > > means
>>> >> > >>>> > > >> to find the neck bottle, but not enough. There still
>>> some
>>> >> > metric
>>> >> > >>>> I
>>> >> > >>>> > think
>>> >> > >>>> > > >> is
>>> >> > >>>> > > >> very useful:
>>> >> > >>>> > > >> 1. filter ratio. i.e.. not only result_size
>>> but
>>> >> > also
>>> >> > >>>> the
>>> >> > >>>> > > >> origin
>>> >> > >>>> > > >> size so we could know how many data is filtered.
>>> >> > >>>> > > >> 2. IO time. The scan_blocks_time is not
>>> enough.
>>> >> If
>>> >> > >>>> it is
>>> >> > >>>> > > high,
>>> >> > >>>> > > >> we know somethings wrong, but not know what cause that
>>> >> problem.
>>> >> > >>>> The
>>> >> > >>>> > real
>>> >> > >>>> > > >> IO
>>> >> > >>>> > > >> time for data is not be provided. As there may be
>>> several
>>> >> file
>>> >> > >>>> for one
>>> >> > >>>> > > >> partition, know the program slow is caused by datanode
>>> or
>>> >> > >>>> executor
>>> >> > >>>> > > itself
>>> >> > >>>> > > >> give us intuition to find the problem.
>>> >> > >>>> > > >> 3. The TableBlockInfo for task. I log it by
>>> >> myself
>>> >> > >>>> when
>>> >> > >>>> > > >> debugging. It tell me how many blocklets is locality.
>>> The
>>> >> spark
>>> >> > >>>> web
>>> >> > >>>> > > >> monitor
>>> >> > >>>> > > >> just give a locality level, but may be only one
>>> blocklet is
>>> >> > >>>> locality.
>>> >> > >>>> > > >>
>>> >> > >>>> > > >>
>>> >> > >>>> > > >> -Regards
>>> >> > >>>> > > >> Kumar Vishal
>>> >> > >>>> > > >>
>>> >> > >>>> > > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <
>>> [hidden email]
>>> >> >
>>> >> > >>>> wrote:
>>> >> > >>>> > > >>
>>> >> > >>>> > > >> > Hi,
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > We are using carbondata to build our table and running
>>> >> query
>>> >> > in
>>> >> > >>>> > > >> > CarbonContext. We have some performance problem during
>>> >> > >>>> refining the
>>> >> > >>>> > > >> system.
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > *Background*：
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > *cluster*： 100 executor，5
>>> >> task/executor,
>>> >> > >>>> 10G
>>> >> > >>>> > > >> > memory/executor
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > *data*: 60+GB(per one
>>> >> replica)
>>> >> > as
>>> >> > >>>> > carbon
>>> >> > >>>> > > >> data
>>> >> > >>>> > > >> > format, 600+MB/file * 100 file, 300+columns,
>>> 300+million
>>> >> rows
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > *sql example:*
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > select A,
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > sum(a),
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > sum(b),
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > sum(c),
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > …( extra 100
>>> >> > aggregation
>>> >> > >>>> like
>>> >> > >>>> > > >> > sum(column))
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > from Table1
>>> LATERAL
>>> >> > VIEW
>>> >> > >>>> > > >> > explode(split(Aarray, ‘*;*’)) ATable AS A
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > where A is not
>>> null
>>> >> and
>>> >> > >>>> d >
>>> >> > >>>> > > >> “ab:c-10”
>>> >> > >>>> > > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44
>>> >> GROUP
>>> >> > BY
>>> >> > >>>> A
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > *target query time*: <10s
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > *current query time*: 15s ~ 25s
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > *scene:* OLAP system. <100
>>> >> > queries
>>> >> > >>>> every
>>> >> > >>>> > > >> day.
>>> >> > >>>> > > >> > Concurrency number is not high. Most time cpu is
>>> idle, so
>>> >> > this
>>> >> > >>>> > service
>>> >> > >>>> > > >> will
>>> >> > >>>> > > >> > run with other program. The service will run for long
>>> >> time.
>>> >> > We
>>> >> > >>>> could
>>> >> > >>>> > > not
>>> >> > >>>> > > >> > occupy a very large memory for every executor.
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > *refine*: I have build
>>> index
>>> >> and
>>> >> > >>>> > > >> dictionary on
>>> >> > >>>> > > >> > d, e, f, g, h and build dictionary on all other
>>> >> aggregation
>>> >> > >>>> > > >> columns(i.e. a,
>>> >> > >>>> > > >> > b, c, …100+ columns). And make sure there is one
>>> segment
>>> >> for
>>> >> > >>>> total
>>> >> > >>>> > > >> data. I
>>> >> > >>>> > > >> > have open the speculation(quantile=0.5, interval=250,
>>> >> > >>>> > multiplier=1.2).
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > Time is mainly spent on first stage before shuffling.
>>> As
>>> >> 95%
>>> >> > >>>> data
>>> >> > >>>> > will
>>> >> > >>>> > > >> be
>>> >> > >>>> > > >> > filtered out, the shuffle process spend little time.
>>> In
>>> >> first
>>> >> > >>>> stage,
>>> >> > >>>> > > >> most
>>> >> > >>>> > > >> > task complete in less than 10s. But there still be
>>> near 50
>>> >> > >>>> tasks
>>> >> > >>>> > > longer
>>> >> > >>>> > > >> > than 10s. Max task time in one query may be 12~16s.
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > *Problem:*
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > 1. GC problem. We suffer a 20%~30% GC time
>>> for
>>> >> some
>>> >> > >>>> task in
>>> >> > >>>> > > >> first
>>> >> > >>>> > > >> > stage after a lot of parameter refinement. We now use
>>> G1
>>> >> GC
>>> >> > in
>>> >> > >>>> > java8.
>>> >> > >>>> > > GC
>>> >> > >>>> > > >> > time will double if use CMS. The main GC time is
>>> spent on
>>> >> > young
>>> >> > >>>> > > >> generation
>>> >> > >>>> > > >> > GC. Almost half memory of young generation will be
>>> copy to
>>> >> > old
>>> >> > >>>> > > >> generation.
>>> >> > >>>> > > >> > It seems lots of object has a long life than GC
>>> period and
>>> >> > the
>>> >> > >>>> space
>>> >> > >>>> > > is
>>> >> > >>>> > > >> not
>>> >> > >>>> > > >> > be reuse(as concurrent GC will release it later).
>>> When we
>>> >> > use a
>>> >> > >>>> > large
>>> >> > >>>> > > >> > Eden(>=1G for example), once GC time will be seconds.
>>> If
>>> >> set
>>> >> > >>>> Eden
>>> >> > >>>> > > >> > little(256M for example), once GC time will be
>>> hundreds
>>> >> > >>>> > milliseconds,
>>> >> > >>>> > > >> but
>>> >> > >>>> > > >> > more frequency and total is still seconds. Is there
>>> any
>>> >> way
>>> >> > to
>>> >> > >>>> > lessen
>>> >> > >>>> > > >> the
>>> >> > >>>> > > >> > GC time? (We don’t consider the first query and second
>>> >> query
>>> >> > >>>> in this
>>> >> > >>>> > > >> case.)
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > 2. Performance refine problem. Row number
>>> after
>>> >> > being
>>> >> > >>>> > > >> filtered is
>>> >> > >>>> > > >> > not uniform. Some node maybe heavy. It spend more time
>>> >> than
>>> >> > >>>> other
>>> >> > >>>> > > node.
>>> >> > >>>> > > >> The
>>> >> > >>>> > > >> > time of one task is 4s ~ 16s. Is any method to refine
>>> it?
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > 3. Too long time for first and second
>>> query. I
>>> >> know
>>> >> > >>>> > > dictionary
>>> >> > >>>> > > >> > and some index need to be loaded for the first time.
>>> But
>>> >> > after
>>> >> > >>>> I
>>> >> > >>>> > > trying
>>> >> > >>>> > > >> use
>>> >> > >>>> > > >> > query below to preheat it, it still spend a lot of
>>> time.
>>> >> How
>>> >> > >>>> could I
>>> >> > >>>> > > >> > preheat the query correctly?
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > select Aarray, a, b, c… from Table1
>>> where
>>> >> > Aarray
>>> >> > >>>> is
>>> >> > >>>> > not
>>> >> > >>>> > > >> null
>>> >> > >>>> > > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h
>>> = 55
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > 4. Any other suggestion to lessen the query
>>> >> time?
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > Some suggestion：
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > The log by class QueryStatisticsRecorder
>>> give
>>> >> me
>>> >> > a
>>> >> > >>>> good
>>> >> > >>>> > > >> means
>>> >> > >>>> > > >> > to find the neck bottle, but not enough. There still
>>> some
>>> >> > >>>> metric I
>>> >> > >>>> > > >> think is
>>> >> > >>>> > > >> > very useful:
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > 1. filter ratio. i.e.. not only
>>> result_size
>>> >> but
>>> >> > >>>> also the
>>> >> > >>>> > > >> origin
>>> >> > >>>> > > >> > size so we could know how many data is filtered.
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > 2. IO time. The scan_blocks_time is not
>>> >> enough.
>>> >> > If
>>> >> > >>>> it is
>>> >> > >>>> > > >> high,
>>> >> > >>>> > > >> > we know somethings wrong, but not know what cause that
>>> >> > >>>> problem. The
>>> >> > >>>> > > >> real IO
>>> >> > >>>> > > >> > time for data is not be provided. As there may be
>>> several
>>> >> > file
>>> >> > >>>> for
>>> >> > >>>> > one
>>> >> > >>>> > > >> > partition, know the program slow is caused by
>>> datanode or
>>> >> > >>>> executor
>>> >> > >>>> > > >> itself
>>> >> > >>>> > > >> > give us intuition to find the problem.
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > 3. The TableBlockInfo for task. I log it
>>> by
>>> >> > myself
>>> >> > >>>> when
>>> >> > >>>> > > >> > debugging. It tell me how many blocklets is locality.
>>> The
>>> >> > >>>> spark web
>>> >> > >>>> > > >> monitor
>>> >> > >>>> > > >> > just give a locality level, but may be only one
>>> blocklet
>>> >> is
>>> >> > >>>> > locality.
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > ---------------------
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > Anning Luo
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > *HULU*
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > Email: [hidden email]
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >> > [hidden email]
>>> >> > >>>> > > >> >
>>> >> > >>>> > > >>
>>> >> > >>>> > > >
>>> >> > >>>> > > >
>>> >> > >>>> > >
>>> >> > >>>> >
>>> >> > >>>>
>>> >> > >>>
>>> >> > >>>
>>> >> > >>
>>> >> > >
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>>
> driverlog3.txt
> <https://drive.google.com/file/d/0B1XM6KeI1nB7QmZXd2d3ZlR2bnM/view?usp=drive_web>
>
> driverlog2.txt
> <https://drive.google.com/file/d/0B1XM6KeI1nB7UF92RWVOWUdwZGM/view?usp=drive_web>
>
> driverlog1.txt
> <https://drive.google.com/file/d/0B1XM6KeI1nB7WjhuNjZHZWZJZGM/view?usp=drive_web>
>
>

... [show rest of quote]

kumarvishal09

Nov 18, 2016; 4:16pm

Re: GC problem and performance refine problem

142 posts

Hi An Lan,
Please find my comments in blue.

1. Another question: use multi-line in TBLPROPERTIES dose not work correctly
like “"DICTIONARY_INCLUDE”=“a”, "DICTIONARY_INCLUDE”=“b”” in the old create
table sql. Only the column in the last DICTIONARY_INCLUDE declare is add
into the dimension. The new one works correctly. But the old way did not
throw any exception.

Its a correct behaviour, table properties is stored in a map so it will
take the last one. You need to give in single line
DICTIONARY_INCLUDE='a,b,c,d'

2. *NEW CREATE TABLE:*

CREATE TABLE IF NOT EXISTS Table1

(* h Int, g Int, f Int, e Int, d String*

a Int, b Int, …(extra near 300 columns)

STORED BY 'org.apache.carbondata.format' TBLPROPERTIES(

DICTIONARY_INCLUDE"="h, g, f , e , d")

What is the cardinality of h,g,f,e,d columns? As you have mentioned *The
h,g,f,e,d are the columns used in filter. And their distinct value *
*increase.*

3. The time cost of third experiment is longer than others. It is because
that
I use the “h=66 and g = 67 and f = 12 and e != 5 and d >’aaaa’ and d <
‘bbbb’” as the filter condition. So considering with the mdk build order,
the result rows will be continuous in the file. So there is some blocket
matching the filter condition totally, while others matching zero row. But
the scheduler on driver did not filter out the zero row matched blockets.
It seems the min/max filter does not work correctly on driver side when
getSplit from InputFormat. There are some case on executor(blocket size is
120k):

In driver carbon does block level pruning based on the min max of block and
only for key columns block pruning is supported(In future we will support
for Measure columns). Blocklet level pruning is done in the executor side.

5. Based on the log in you case it is going to blocklet level distribution,
we have exposed one property*"enable.blocklet.distribution"* to enable and
disable blocklet distribution in latest master, you can take latest patch
from master and run your queries configuring this parameter
*enable.blocklet.distribution=false
*in* carbon.proeprties*

4. Further, I think from the 2th experiment balancing the data is important.
So I will change the blocket size to 2k rows if the min/max filter could
work on driver side. I have not changed the int type to double type for
measure, I will did it later.
Instead of changing the blocklet size, you can change the carbon block size
to some lesser value like 256mb or 512mb
If you are using 0.1 release you need to set property
carbon.max.file.size=256 in carbon.properties
In case of latest master you can add this parameter in
TBL_PROPERTIES table_blocksize=256 while create the carbon table.

-Regards
Kumar Vishal

On Thu, Nov 17, 2016 at 3:55 PM, An Lan <[hidden email]> wrote:

> Hi Kumar Vishal,
>
> The accessory for the last email is on the bottom, Pasted as url. The log
> is too big to send. I put it on good driver and paste the link.
>
> 2016-11-17 16:59 GMT+08:00 An Lan <[hidden email]>:
>
> > Hi Kumar Vishal,
> >
> >
> >
> > I redo some experiment with a detail driver log. The logs are in the
> > accessory. And every log run the same query for 10 times.
> >
> >
> >
> > 1. “driverlog1” is under same condition as previous I did.
> >
> > 2. “driverlog2”: in this experiment, before load data, I shuffle
> > the csv file to make the rows with same attribute will distribute uniform
> > between all csv files. i.e. if we use the query filter on the origin csv
> > file, the result quantity is almost same for every csv file. And every
> csv
> > file is one block. So every partition for the loader of carbondata has
> > uniform data. That method works well:
> >
> > Time cost of first experiment: 103543, 49363, 125430, 22138,
> > 21626, 27905, 22086, 21332, 23501, 16715
> >
> > Time cost of second experiment:62466, 38709, 24700, 18767, 18325, 15887,
> > 16014, 15452, 17440, 15563
> >
> > 3. “driverlog3”: First, use the shuffled csv data as 2th
> > experiment. Second, modify the create table sql:
> >
> >
> >
> > *OLD CREATE TABLE:*
> >
> > CREATE TABLE IF NOT EXISTS Table1
> >
> > (* h Int, g Int, d String, f Int, e Int,*
> >
> > a Int, b Int, …(extra near 300 columns)
> >
> > STORED BY 'org.apache.carbondata.format'TBLPROPERTIES(
> >
> > "NO_INVERTED_INDEX”=“a”,"NO_INVERTED_INDEX”=“b”,
> >
> > …(extra near 300 columns)
> >
> > "DICTIONARY_INCLUDE”=“a”,
> >
> > "DICTIONARY_INCLUDE”=“b”,
> >
> > …(extra near 300 columns)
> >
> > )
> >
> >
> >
> > *NEW CREATE TABLE:*
> >
> > CREATE TABLE IF NOT EXISTS Table1
> >
> > (* h Int, g Int, f Int, e Int, d String*
> >
> > a Int, b Int, …(extra near 300 columns)
> >
> > STORED BY 'org.apache.carbondata.format' TBLPROPERTIES(
> >
> > DICTIONARY_INCLUDE"="h, g, f , e , d"
> >
> > )
> >
> > The h,g,f,e,d are the columns used in filter. And their distinct value
> > increase.
> >
> > Time cost of third experiment: 57088, 37105, 41588, 43540, 38143, 35581,
> > 34040, 35886, 34610, 32589
> >
> > The time cost of third experiment is longer than others. It is because
> > that I use the “h=66 and g = 67 and f = 12 and e != 5 and d >’aaaa’ and
> d <
> > ‘bbbb’” as the filter condition. So considering with the mdk build order,
> > the result rows will be continuous in the file. So there is some blocket
> > matching the filter condition totally, while others matching zero row.
> But
> > the scheduler on driver did not filter out the zero row matched blockets.
> > It seems the min/max filter does not work correctly on driver side when
> > getSplit from InputFormat. There are some case on executor(blocket size
> is
> > 120k):
> >
> > +------------------+----------------+--------------------+--
> > --------------+---------------+-----------+-------------------+
> >
> > | task_id|load_blocks_time|load_dictionary_time|scan_blocks_ti
> > me|scan_blocks_num|result_size|total_executor_time|
> >
> > +------------------+----------------+--------------------+--
> > --------------+---------------+-----------+-------------------+
> >
> > |603288369228186_62| 426 | 2 | 28
> > | 3 | 0 | 470 |
> >
> > +------------------+----------------+--------------------+--
> > --------------+---------------+-----------+-------------------+
> >
> >
> >
> > 16/11/16 21:36:25 INFO impl.FilterScanner: pool-30-thread-1 [INDEX] index
> > filter : 113609/120000
> >
> > +-------------------+----------------+--------------------+-
> > ---------------+---------------+-----------+-------------------+
> >
> > | task_id|load_blocks_time|load_
> dictionary_time|scan_blocks_ti
> > me|scan_blocks_num|result_size|total_executor_time|
> >
> > +-------------------+----------------+--------------------+-
> > ---------------+---------------+-----------+-------------------+
> >
> > |603288369228186_428| 380 | 4 |
> > 661 | 3 | 113609 | 13634 |
> >
> > +-------------------+----------------+--------------------+-
> > ---------------+---------------+-----------+-------------------+
> >
> > In first case result_size is zero, but the blocket is still put in one
> > task. That make some task do nothing, but others suffer a heavy work.
> >
> > In second case, example only one blocket has data(4 or 6 blockets for one
> > task). I add the log about invert index filter ratio like “[INDEX] index
> > filter : <result count>/<total block count>”
> >
> > So, how could I make the min/max filter work correctly in driver side?
> >
> >
> >
> > Another question: use multi-line in TBLPROPERTIES dose not work correctly
> > like “"DICTIONARY_INCLUDE”=“a”, "DICTIONARY_INCLUDE”=“b”” in the old
> > create table sql. Only the column in the last DICTIONARY_INCLUDE declare
> is
> > add into the dimension. The new one works correctly. But the old way did
> > not throw any exception.
> >
> >
> >
> > Further, I think from the 2th experiment balancing the data is important.
> > So I will change the blocket size to 2k rows if the min/max filter could
> > work on driver side. I have not changed the int type to double type for
> > measure, I will did it later.
> >
> > 2016-11-17 16:34 GMT+08:00 An Lan <[hidden email]>:
> >
> >> Hi Kumar Vishal,
> >>
> >>
> >>
> >> I redo some experiment with a detail driver log. The logs are in the
> >> accessory. And every log run the same query for 10 times.
> >>
> >>
> >>
> >> 1. “driverlog1” is under same condition as previous I did.
> >>
> >> 2. “driverlog2”: in this experiment, before load data, I shuffle
> >> the csv file to make the rows with same attribute will distribute
> uniform
> >> between all csv files. i.e. if we use the query filter on the origin csv
> >> file, the result quantity is almost same for every csv file. And every
> csv
> >> file is one block. So every partition for the loader of carbondata has
> >> uniform data. That method works well:
> >>
> >> Time cost of first experiment: 103543, 49363, 125430, 22138,
> >> 21626, 27905, 22086, 21332, 23501, 16715
> >>
> >> Time cost of second experiment:62466, 38709, 24700, 18767, 18325, 15887,
> >> 16014, 15452, 17440, 15563
> >>
> >> 3. “driverlog3”: First, use the shuffled csv data as 2th
> >> experiment. Second, modify the create table sql:
> >>
> >>
> >>
> >> *OLD CREATE TABLE:*
> >>
> >> CREATE TABLE IF NOT EXISTS Table1
> >>
> >> (* h Int, g Int, d String, f Int, e Int,*
> >>
> >> a Int, b Int, …(extra near 300 columns)
> >>
> >> STORED BY 'org.apache.carbondata.format'TBLPROPERTIES(
> >>
> >> "NO_INVERTED_INDEX”=“a”,"NO_INVERTED_INDEX”=“b”,
> >>
> >> …(extra near 300 columns)
> >>
> >> "DICTIONARY_INCLUDE”=“a”,
> >>
> >> "DICTIONARY_INCLUDE”=“b”,
> >>
> >> …(extra near 300 columns)
> >>
> >> )
> >>
> >>
> >>
> >> *NEW CREATE TABLE:*
> >>
> >> CREATE TABLE IF NOT EXISTS Table1
> >>
> >> (* h Int, g Int, f Int, e Int, d String*
> >>
> >> a Int, b Int, …(extra near 300 columns)
> >>
> >> STORED BY 'org.apache.carbondata.format' TBLPROPERTIES(
> >>
> >> DICTIONARY_INCLUDE"="h, g, f , e , d"
> >>
> >> )
> >>
> >> The h,g,f,e,d are the columns used in filter. And their distinct value
> >> increase.
> >>
> >> Time cost of third experiment: 57088, 37105, 41588, 43540, 38143, 35581,
> >> 34040, 35886, 34610, 32589
> >>
> >> The time cost of third experiment is longer than others. It is because
> >> that I use the “h=66 and g = 67 and f = 12 and e != 5 and d >’aaaa’ and
> d <
> >> ‘bbbb’” as the filter condition. So considering with the mdk build
> order,
> >> the result rows will be continuous in the file. So there is some blocket
> >> matching the filter condition totally, while others matching zero row.
> But
> >> the scheduler on driver did not filter out the zero row matched
> blockets.
> >> It seems the min/max filter does not work correctly on driver side when
> >> getSplit from InputFormat. There are some case on executor(blocket size
> is
> >> 120k):
> >>
> >> +------------------+----------------+--------------------+--
> >> --------------+---------------+-----------+-------------------+
> >>
> >> | task_id|load_blocks_time|load_
> dictionary_time|scan_blocks_ti
> >> me|scan_blocks_num|result_size|total_executor_time|
> >>
> >> +------------------+----------------+--------------------+--
> >> --------------+---------------+-----------+-------------------+
> >>
> >> |603288369228186_62| 426 | 2 |
> >> 28 | 3 | 0 | 470 |
> >>
> >> +------------------+----------------+--------------------+--
> >> --------------+---------------+-----------+-------------------+
> >>
> >>
> >>
> >> 16/11/16 21:36:25 INFO impl.FilterScanner: pool-30-thread-1 [INDEX]
> index
> >> filter : 113609/120000
> >>
> >> +-------------------+----------------+--------------------+-
> >> ---------------+---------------+-----------+-------------------+
> >>
> >> | task_id|load_blocks_time|load_
> dictionary_time|scan_blocks_ti
> >> me|scan_blocks_num|result_size|total_executor_time|
> >>
> >> +-------------------+----------------+--------------------+-
> >> ---------------+---------------+-----------+-------------------+
> >>
> >> |603288369228186_428| 380 | 4 |
> >> 661 | 3 | 113609 | 13634 |
> >>
> >> +-------------------+----------------+--------------------+-
> >> ---------------+---------------+-----------+-------------------+
> >>
> >> In first case result_size is zero, but the blocket is still put in one
> >> task. That make some task do nothing, but others suffer a heavy work.
> >>
> >> In second case, example only one blocket has data(4 or 6 blockets for
> one
> >> task). I add the log about invert index filter ratio like “[INDEX] index
> >> filter : <result count>/<total block count>”
> >>
> >> So, how could I make the min/max filter work correctly in driver side?
> >>
> >>
> >>
> >> Another question: use multi-line in TBLPROPERTIES dose not work
> correctly
> >> like “"DICTIONARY_INCLUDE”=“a”, "DICTIONARY_INCLUDE”=“b”” in the old
> >> create table sql. Only the column in the last DICTIONARY_INCLUDE
> declare is
> >> add into the dimension. The new one works correctly. But the old way did
> >> not throw any exception.
> >>
> >>
> >>
> >> Further, I think from the 2th experiment balancing the data is
> important.
> >> So I will change the blocket size to 2k rows if the min/max filter could
> >> work on driver side. I have not changed the int type to double type for
> >> measure, I will did it later.
> >>
> >> 2016-11-16 21:55 GMT+08:00 Kumar Vishal <[hidden email]>:
> >>
> >>> Hi An Lan,
> >>> Please go through discussion *how to create carbon table for better
> query
> >>> performance* shared by Bill. I am coping the same in this mail for you
> >>> reference.
> >>>
> >>>
> >>> Discussion how to crate the CarbonData table with good performance
> >>> Suggestion to create Carbon table
> >>> Recently we used CarbonData to do the performance in Telecommunication
> >>> filed
> >>> and summarize some of the Suggestions while creating the CarbonData
> >>> table.We
> >>> have tables which range from 10 thousand rows to 10 billion rows and
> have
> >>> from 100 columns to 300 columns. Following are some of the columns used
> >>> in
> >>> the table.
> >>> Column name Data type Cardinality Attribution
> >>> msisdn String 30 million dimension
> >>> BEGIN_TIME bigint 10 thousand dimension
> >>> HOST String 1 million dimension
> >>> Dime_1 String 1 thousand dimension
> >>> counter_1 numeric(20,0) NA measure
> >>> ... ... NA ...
> >>> counter_100 numeric(20,0) NA measure
> >>> We have about more than 50 test cases; according to the test case we
> >>> summarize some suggestion to create the table which can have a better
> >>> query
> >>> performance. 1. Put the frequently-used column filter in the beginning.
> >>> For
> >>> example, MSISDN filter is used in most of the query then put the MSISDN
> >>> in
> >>> the first column. The create table command can be as follows, the query
> >>> which has MSISDN as a filter will be good (because the MSISDN is high
> >>> cardinality, if create table like this the compress ratio will be
> >>> decreased)/create table carbondata_table(//msisdn String,//...//)STORED
> >>> BY
> >>> 'org.apache.carbondata.format' //TBLPROPERTIES (
> >>> 'DICTIONARY_EXCLUDE'='MSISDN,..','DICTIONARY_INCLUDE'='...');/2.
> >>> If
> >>> has
> >>> multiple column which is frequently-use in the filter, put it to the
> >>> front
> >>> in the order as low cardinality to high cardinality.For example if
> >>> msisdn,
> >>> host and dime_1 is frequently-used column, the table column order can
> be
> >>> like dime_1->host->msisdn, because the dime_1 cardinality is low.
> Create
> >>> table command can be as follows. This will increase the compression
> ratio
> >>> and good performance for filter on dime_1, host and msisdn. /create
> table
> >>> carbondata_table(Dime_1 String,HOST String,MSISDN String,...)STORED BY
> >>> 'org.apache.carbondata.format' /TBLPROPERTIES (
> >>> 'DICTIONARY_EXCLUDE'='MSISDN,HOST..','DICTIONARY_INCLUDE'='
> Dime_1..');3.
> >>> If
> >>> no column is frequent-use in filter, then can put all the dimension
> >>> column
> >>> order as from low cardinality to high cardinality. Create table command
> >>> can
> >>> be as following: /create table carbondata_table(Dime_1
> String,BEGIN_TIME
> >>> bigintHOST String,MSISDN String,...)STORED BY
> >>> 'org.apache.carbondata.format'
> >>> TBLPROPERTIES (
> >>> 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI..','DICTIONARY_
> >>> INCLUDE'='Dime_1,END_TIME,BEGIN_TIME..');/4.
> >>> For measure that needs no high accuracy, then no need to use
> >>> numeric(20,0)
> >>> data type, suggestion is to use double to replace it than will increase
> >>> the
> >>> query performance. If one test case uses double to replace the numeric
> >>> (20,
> >>> 0) the query improve 5 times from 15 second to 3 second. Create table
> >>> command can be as follows. /create table carbondata_table(Dime_1
> >>> String,BEGIN_TIME bigintHOST String,MSISDN String,counter_1
> >>> double,counter_2
> >>> double,...counter_100 double,)STORED BY 'org.apache.carbondata.format'
> >>> TBLPROPERTIES (
> >>> 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI','DICTIONARY_
> >>> INCLUDE'='Dime_1,END_TIME,BEGIN_TIME');/5.
> >>> If the column which is always incremental like start_time. For example
> >>> one
> >>> scenario: every day we will load data into carbon and the start_time is
> >>> incremental for each load. For this scenario you can put the start_time
> >>> column in the back of dimension, because always incremental value can
> use
> >>> the min/max index well always. Create table command can be as
> following.
> >>> /create table carbondata_table(Dime_1 String,HOST String,MSISDN
> >>> String,counter_1 double,counter_2 double,BEGIN_TIME
> bigint,...counter_100
> >>> double,)STORED BY 'org.apache.carbondata.format' TBLPROPERTIES (
> >>> 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI','DICTIONARY_
> >>> INCLUDE'='Dime_1,END_TIME,BEGIN_TIME');/One
> >>> more is for the dimension whether dictionary is needed or not, we
> >>> suggest if
> >>> the cardinality higher than 50 thousand do not put it as dictionary
> >>> column.
> >>> If high cardinality column put as dictionary will impact the load
> >>> performance.
> >>>
> >>> -Regards
> >>> Kumar Vishal
> >>>
> >>> On Wed, Nov 16, 2016 at 7:19 PM, Kumar Vishal <
> [hidden email]
> >>> >
> >>> wrote:
> >>>
> >>> > Hi An Lan,
> >>> > For detail logging you need to add log4j configuration in
> >>> > spark-default.conf for both driver and executor.
> >>> > * spark.driver.extraJavaOption = -Dlog4j.configuration=file:/<f
> >>> ilepath>*
> >>> > * spark.executor.extraJavaOption = -Dlog4j.configuration=file:/<f
> >>> ilepath>*
> >>> >
> >>> > Please make sure in* log4j.properties* *log4j.rootCategory* is *INFO*
> >>> >
> >>> > -Regards
> >>> > Kumar Vishal
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > On Wed, Nov 16, 2016 at 2:06 PM, An Lan <[hidden email]> wrote:
> >>> >
> >>> >> Hi Kumar Vishal,
> >>> >>
> >>> >> Thanks for your suggestion.
> >>> >> The driver log not contain block distribution log by default. How
> >>> could I
> >>> >> open it?
> >>> >> And how does the order of the dimensions be decided?
> >>> >>
> >>> >> 2016-11-16 15:14 GMT+08:00 Kumar Vishal <[hidden email]
> >:
> >>> >>
> >>> >> > Hi An Lan,
> >>> >> > Data is already distributed, in this case may be
> one
> >>> >> > blocklet is returning more number of rows and other returning less
> >>> >> because
> >>> >> > of this some task will take more time.
> >>> >> >
> >>> >> > In driver log block distribution log is not present, so it is not
> >>> clear
> >>> >> > whether it is going for block distribution or blocklet
> distribution.
> >>> >> > distribution, can you please add the detail driver log.
> >>> >> >
> >>> >> > *Some suggestion:*
> >>> >> >
> >>> >> > 1. If column is String and you are not applying filter then in
> >>> create
> >>> >> > statement add it in dictionary exclude this will avoid lookup in
> >>> >> > dictionary. If number of String column are less then better add it
> >>> in
> >>> >> > dictionary exclude.
> >>> >> >
> >>> >> > 2. If column is a numeric column and you are not applying filter
> >>> then no
> >>> >> > need to add in dictionary include or exclude, so it will be a
> >>> measure
> >>> >> > column.
> >>> >> >
> >>> >> > 3. Currently in carbon for measure column if you will create as a
> >>> double
> >>> >> > data type it will give more compression as currently value
> >>> compression
> >>> >> mode
> >>> >> > is support for double data type.
> >>> >> >
> >>> >> > -Regards
> >>> >> > Kumar Vishal
> >>> >> >
> >>> >> >
> >>> >> > On Wed, Nov 16, 2016 at 8:12 AM, An Lan <[hidden email]>
> wrote:
> >>> >> >
> >>> >> > > Hi Kumar Vishal,
> >>> >> > >
> >>> >> > > 1. I found the quantity of rows filtered out by invert index is
> >>> not
> >>> >> > > uniform between different tasks and the difference is large.
> Some
> >>> task
> >>> >> > may
> >>> >> > > be 3~4k row after filtered, but the longer tasks may be 3~4w.
> When
> >>> >> most
> >>> >> > > longer task on same node, time cost will be more longer than
> >>> others.
> >>> >> So,
> >>> >> > is
> >>> >> > > there any way to balance the data and make rows distribute
> uniform
> >>> >> > between
> >>> >> > > task?
> >>> >> > > Now, the blocklet size is still 120k rows. 3k+ blocket in total.
> >>> Every
> >>> >> > > task has 6 blocket.
> >>> >> > > 2. I upload some logs last time. Is there some access problem
> >>> about
> >>> >> it?
> >>> >> > Is
> >>> >> > > there any suggest or question about it?
> >>> >> > >
> >>> >> > > 2016-11-14 11:29 GMT+08:00 An Lan <[hidden email]>:
> >>> >> > >
> >>> >> > >> Hi Kumar Vishal,
> >>> >> > >>
> >>> >> > >>
> >>> >> > >> Driver and some executor logs are in the accessory. The same
> >>> query
> >>> >> run
> >>> >> > >> for five times.
> >>> >> > >>
> >>> >> > >>
> >>> >> > >>
> >>> >> > >> Time consume for every query:
> >>> >> > >>
> >>> >> > >> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
> >>> >> > >>
> >>> >> > >>
> >>> >> > >> The first stage for every query:
> >>> >> > >>
> >>> >> > >> (first query -> start from 0 stage
> >>> >> > >>
> >>> >> > >> second query -> 4
> >>> >> > >>
> >>> >> > >> third query -> 8
> >>> >> > >>
> >>> >> > >> fourth query -> 12
> >>> >> > >>
> >>> >> > >> five query -> 16)
> >>> >> > >>
> >>> >> > >> see "first stage of queries.png" in accessory
> >>> >> > >>
> >>> >> > >>
> >>> >> > >>
> >>> >> > >> Longest task for every query:
> >>> >> > >>
> >>> >> > >> (Task and log relationship:
> >>> >> > >>
> >>> >> > >> stage 0, task 249 -> 1.log
> >>> >> > >>
> >>> >> > >> stage 4, task 420 -> 2.log
> >>> >> > >>
> >>> >> > >> stage 8, task 195 -> 3.log
> >>> >> > >>
> >>> >> > >> stage 12, task 350 -> 4.log
> >>> >> > >>
> >>> >> > >> stage 16, task 321 -> 5.log)
> >>> >> > >> see "longest task.png" in accessory
> >>> >> > >>
> >>> >> > >>
> >>> >> > >> longest tasks.png
> >>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7dGY4cF9jRDk1QTg/
> >>> >> > view?usp=drive_web>
> >>> >> > >>
> >>> >> > >> first stage of queries.png
> >>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7SVZsNnR2VEw2X0k/
> >>> >> > view?usp=drive_web>
> >>> >> > >>
> >>> >> > >> dirverlog
> >>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7VllnVkNnenhyTTA/
> >>> >> > view?usp=drive_web>
> >>> >> > >>
> >>> >> > >> 1.log
> >>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7eUdUWmlia1J4bk0/
> >>> >> > view?usp=drive_web>
> >>> >> > >>
> >>> >> > >> 2.log
> >>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ZUlGZHZVQ3phdmM/
> >>> >> > view?usp=drive_web>
> >>> >> > >>
> >>> >> > >> 3.log
> >>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7UHpBQzREX3N5aEk/
> >>> >> > view?usp=drive_web>
> >>> >> > >>
> >>> >> > >> 4.log
> >>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7NVctMmYwNldCVEk/
> >>> >> > view?usp=drive_web>
> >>> >> > >>
> >>> >> > >> 5.log
> >>> >> > >> <https://drive.google.com/file/d/0B1XM6KeI1nB7ODhHNE5sSGNfOVE/
> >>> >> > view?usp=drive_web>
> >>> >> > >>
> >>> >> > >>
> >>> >> > >> 2016-11-14 11:22 GMT+08:00 An Lan <[hidden email]>:
> >>> >> > >>
> >>> >> > >>> Hi,
> >>> >> > >>>
> >>> >> > >>>
> >>> >> > >>>
> >>> >> > >>> Driver and some executor logs are in the accessory. The same
> >>> query
> >>> >> run
> >>> >> > >>> for five times.
> >>> >> > >>>
> >>> >> > >>>
> >>> >> > >>>
> >>> >> > >>> Time consume for every query:
> >>> >> > >>>
> >>> >> > >>> 67068ms, 45758ms, 26497ms, 22619ms, 21504ms
> >>> >> > >>>
> >>> >> > >>>
> >>> >> > >>>
> >>> >> > >>> The first stage for every query:
> >>> >> > >>>
> >>> >> > >>> (first query -> start from 0 stage
> >>> >> > >>>
> >>> >> > >>> second query -> 4
> >>> >> > >>>
> >>> >> > >>> third query -> 8
> >>> >> > >>>
> >>> >> > >>> fourth query -> 12
> >>> >> > >>>
> >>> >> > >>> five query -> 16)
> >>> >> > >>>
> >>> >> > >>>
> >>> >> > >>> Longest task for every query:
> >>> >> > >>>
> >>> >> > >>> (Task and log relationship:
> >>> >> > >>>
> >>> >> > >>> stage 0, task 249 -> 1.log
> >>> >> > >>>
> >>> >> > >>> stage 4, task 420 -> 2.log
> >>> >> > >>>
> >>> >> > >>> stage 8, task 195 -> 3.log
> >>> >> > >>>
> >>> >> > >>> stage 12, task 350 -> 4.log
> >>> >> > >>>
> >>> >> > >>> stage 16, task 321 -> 5.log)
> >>> >> > >>>
> >>> >> > >>>
> >>> >> > >>>
> >>> >> > >>> 2016-11-11 20:25 GMT+08:00 Kumar Vishal <
> >>> [hidden email]
> >>> >> >:
> >>> >> > >>>
> >>> >> > >>>> What is the time difference between first time and second
> time
> >>> >> query
> >>> >> > as
> >>> >> > >>>> second time it will read from os cache,so i think there wont
> be
> >>> >> any IO
> >>> >> > >>>> bottleneck.
> >>> >> > >>>>
> >>> >> > >>>> Can u provide driver log and executor log for task which are
> >>> taking
> >>> >> > more
> >>> >> > >>>> time.
> >>> >> > >>>>
> >>> >> > >>>>
> >>> >> > >>>> -Regards
> >>> >> > >>>> Kumar Vishal
> >>> >> > >>>>
> >>> >> > >>>> On Fri, Nov 11, 2016 at 3:21 PM, An Lan <[hidden email]>
> >>> >> wrote:
> >>> >> > >>>>
> >>> >> > >>>> > 1. I have set --num-executors=100 in all test. The image
> >>> write
> >>> >> 100
> >>> >> > >>>> nodes
> >>> >> > >>>> > means one executor for one node, and when there is 64 node,
> >>> there
> >>> >> > >>>> maybe two
> >>> >> > >>>> > or three executor on one node. The total executor number is
> >>> >> always
> >>> >> > >>>> 100.
> >>> >> > >>>> >
> >>> >> > >>>> > 2. I do not find the property enable.blocklet.distribution
> in
> >>> >> > 0.1.1. I
> >>> >> > >>>> > found it in 0.2.0. I am testing on 0.1.1, and will try it
> >>> later.
> >>> >> If
> >>> >> > >>>> you
> >>> >> > >>>> > concern locality level, it seems that the most long query
> is
> >>> not
> >>> >> > >>>> caused by
> >>> >> > >>>> > IO problem. Statistic of the long task statistic like this:
> >>> >> > >>>> >
> >>> >> > >>>> > +--------------------+--------
> --------+--------------------+
> >>> >> > >>>> > ----------------+-------------
> --+-----------+---------------
> >>> >> ----+
> >>> >> > >>>> > | task_id|load_blocks_time|load
> >>> >> > >>>> _dictionary_time|scan_blocks_
> >>> >> > >>>> > time|scan_blocks_num|result_size|total_executor_time|
> >>> >> > >>>> > +--------------------+--------
> --------+--------------------+
> >>> >> > >>>> > ----------------+-------------
> --+-----------+---------------
> >>> >> ----+
> >>> >> > >>>> > |4199754456147478_134| 1 | 32
> |
> >>> >> > >>>> > 3306 | 3 | 42104 | 13029 |
> >>> >> > >>>> > +--------------------+--------
> --------+--------------------+
> >>> >> > >>>> > ----------------+-------------
> --+-----------+---------------
> >>> >> ----+
> >>> >> > >>>> >
> >>> >> > >>>> > I have suffer from IO, but after configure speculation,
> most
> >>> >> tasks
> >>> >> > >>>> > with a long IO will be resend.
> >>> >> > >>>> >
> >>> >> > >>>> >
> >>> >> > >>>> > 3. distinct value:
> >>> >> > >>>> >
> >>> >> > >>>> > g: 11
> >>> >> > >>>> >
> >>> >> > >>>> > h: 3
> >>> >> > >>>> >
> >>> >> > >>>> > f: 7
> >>> >> > >>>> >
> >>> >> > >>>> > e: 3
> >>> >> > >>>> >
> >>> >> > >>>> > d: 281
> >>> >> > >>>> >
> >>> >> > >>>> > 4. In the query, only e, f ,g, h, d and A are used in
> filter,
> >>> >> others
> >>> >> > >>>> are
> >>> >> > >>>> > not used. So I think others used in the aggregation are no
> >>> need
> >>> >> > added
> >>> >> > >>>> for
> >>> >> > >>>> > index.
> >>> >> > >>>> >
> >>> >> > >>>> > 5. I have write the e, f, g, h, d in most left just after
> >>> "create
> >>> >> > >>>> table..."
> >>> >> > >>>> >
> >>> >> > >>>> > 2016-11-11 15:08 GMT+08:00 Kumar Vishal <
> >>> >> [hidden email]
> >>> >> > >:
> >>> >> > >>>> >
> >>> >> > >>>> > > Hi An Lan,
> >>> >> > >>>> > >
> >>> >> > >>>> > > Please confirm below things.
> >>> >> > >>>> > >
> >>> >> > >>>> > > 1. Is dynamic executor is enabled?? If it is enabled can
> u
> >>> >> > disabled
> >>> >> > >>>> and
> >>> >> > >>>> > > try(this is to check is there any impact with dynamic
> >>> executor)
> >>> >> > >>>> > > for disabling dynamic executor you need set in
> >>> >> > >>>> spark-default.conf
> >>> >> > >>>> > > --num-executors=100
> >>> >> > >>>> > >
> >>> >> > >>>> > > 2. Can u please set in below property
> >>> >> > enable.blocklet.distribution=f
> >>> >> > >>>> alse
> >>> >> > >>>> > > and execute the query.
> >>> >> > >>>> > >
> >>> >> > >>>> > > 3. Cardinality of each column.
> >>> >> > >>>> > >
> >>> >> > >>>> > > 4. Any reason why you are setting "NO_INVERTED_INDEX”=“a”
> >>> ??
> >>> >> > >>>> > >
> >>> >> > >>>> > > 5. Can u keep all the columns which is present in filter
> on
> >>> >> left
> >>> >> > >>>> side so
> >>> >> > >>>> > > less no of blocks will identified during block pruning
> and
> >>> it
> >>> >> will
> >>> >> > >>>> > improve
> >>> >> > >>>> > > the query performance.
> >>> >> > >>>> > >
> >>> >> > >>>> > >
> >>> >> > >>>> > > -Regards
> >>> >> > >>>> > > Kumar Vishal
> >>> >> > >>>> > >
> >>> >> > >>>> > > On Fri, Nov 11, 2016 at 11:59 AM, An Lan <
> >>> [hidden email]>
> >>> >> > >>>> wrote:
> >>> >> > >>>> > >
> >>> >> > >>>> > > > Hi Kumar Vishal,
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > 1. Create table ddl:
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > CREATE TABLE IF NOT EXISTS Table1
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > (* h Int, g Int, d String, f Int, e Int,*
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > a Int, b Int, …(extra near 300 columns)
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > STORED BY 'org.apache.carbondata.format'
> >>> >> > >>>> > > > TBLPROPERTIES(
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > "NO_INVERTED_INDEX”=“a”,
> >>> >> > >>>> > > > "NO_INVERTED_INDEX”=“b”,
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > …(extra near 300 columns)
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > "DICTIONARY_INCLUDE”=“a”,
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > "DICTIONARY_INCLUDE”=“b”,
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > …(extra near 300 columns)
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > )
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > 2. 3. There more than hundreds node in the
> >>> cluster, but
> >>> >> > >>>> cluster
> >>> >> > >>>> > is
> >>> >> > >>>> > > > used mixed with other application. Some time when node
> >>> is
> >>> >> > >>>> enough, we
> >>> >> > >>>> > > will
> >>> >> > >>>> > > > get 100 distinct node.
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > 4. I give a statistic of task time during once
> >>> query and
> >>> >> > mark
> >>> >> > >>>> > > > distinct nodes below:
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > [image: 内嵌图片 1]
> >>> >> > >>>> > > >
> >>> >> > >>>> > > >
> >>> >> > >>>> > > >
> >>> >> > >>>> > > >
> >>> >> > >>>> > > > 2016-11-10 23:52 GMT+08:00 Kumar Vishal <
> >>> >> > >>>> [hidden email]>:
> >>> >> > >>>> > > >
> >>> >> > >>>> > > >> Hi Anning Luo,
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >> Can u please provide below details.
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >> 1.Create table ddl.
> >>> >> > >>>> > > >> 2.Number of node in you cluster setup.
> >>> >> > >>>> > > >> 3. Number of executors per node.
> >>> >> > >>>> > > >> 4. Query statistics.
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >> Please find my comments in bold.
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >> Problem:
> >>> >> > >>>> > > >> 1. GC problem. We suffer a
> >>> 20%~30%
> >>> >> GC
> >>> >> > >>>> time
> >>> >> > >>>> > for
> >>> >> > >>>> > > >> some task in first stage after a lot of parameter
> >>> >> refinement.
> >>> >> > We
> >>> >> > >>>> now
> >>> >> > >>>> > use
> >>> >> > >>>> > > >> G1
> >>> >> > >>>> > > >> GC in java8. GC time will double if use CMS. The main
> GC
> >>> >> time
> >>> >> > is
> >>> >> > >>>> spent
> >>> >> > >>>> > > on
> >>> >> > >>>> > > >> young generation GC. Almost half memory of young
> >>> generation
> >>> >> > will
> >>> >> > >>>> be
> >>> >> > >>>> > copy
> >>> >> > >>>> > > >> to
> >>> >> > >>>> > > >> old generation. It seems lots of object has a long
> life
> >>> >> than GC
> >>> >> > >>>> period
> >>> >> > >>>> > > and
> >>> >> > >>>> > > >> the space is not be reuse(as concurrent GC will
> release
> >>> it
> >>> >> > >>>> later).
> >>> >> > >>>> > When
> >>> >> > >>>> > > we
> >>> >> > >>>> > > >> use a large Eden(>=1G for example), once GC time will
> be
> >>> >> > >>>> seconds. If
> >>> >> > >>>> > set
> >>> >> > >>>> > > >> Eden little(256M for example), once GC time will be
> >>> hundreds
> >>> >> > >>>> > > milliseconds,
> >>> >> > >>>> > > >> but more frequency and total is still seconds. Is
> there
> >>> any
> >>> >> way
> >>> >> > >>>> to
> >>> >> > >>>> > > lessen
> >>> >> > >>>> > > >> the GC time? (We don’t consider the first query and
> >>> second
> >>> >> > query
> >>> >> > >>>> in
> >>> >> > >>>> > this
> >>> >> > >>>> > > >> case.)
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >> *How many node are present in your cluster setup?? If
> >>> nodes
> >>> >> are
> >>> >> > >>>> less
> >>> >> > >>>> > > >> please
> >>> >> > >>>> > > >> reduce the number of executors per node.*
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >> 2. Performance refine
> problem.
> >>> Row
> >>> >> > >>>> number
> >>> >> > >>>> > after
> >>> >> > >>>> > > >> being filtered is not uniform. Some node maybe heavy.
> It
> >>> >> spend
> >>> >> > >>>> more
> >>> >> > >>>> > time
> >>> >> > >>>> > > >> than other node. The time of one task is 4s ~ 16s. Is
> >>> any
> >>> >> > method
> >>> >> > >>>> to
> >>> >> > >>>> > > refine
> >>> >> > >>>> > > >> it?
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >> 3. Too long time for first
> and
> >>> >> second
> >>> >> > >>>> query.
> >>> >> > >>>> > I
> >>> >> > >>>> > > >> know dictionary and some index need to be loaded for
> the
> >>> >> first
> >>> >> > >>>> time.
> >>> >> > >>>> > But
> >>> >> > >>>> > > >> after I trying use query below to preheat it, it still
> >>> >> spend a
> >>> >> > >>>> lot of
> >>> >> > >>>> > > >> time.
> >>> >> > >>>> > > >> How could I preheat the query correctly?
> >>> >> > >>>> > > >> select Aarray, a, b, c… from
> >>> Table1
> >>> >> > where
> >>> >> > >>>> > Aarray
> >>> >> > >>>> > > >> is
> >>> >> > >>>> > > >> not null and d = “sss” and e !=22 and f = 33 and g =
> 44
> >>> and
> >>> >> h =
> >>> >> > >>>> 55
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >> *Currently we are working on first time query
> >>> improvement.
> >>> >> For
> >>> >> > >>>> now you
> >>> >> > >>>> > > can
> >>> >> > >>>> > > >> run select count(*) or count(column), so all the
> blocks
> >>> get
> >>> >> > >>>> loaded and
> >>> >> > >>>> > > >> then
> >>> >> > >>>> > > >> you can run the actual query.*
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >> 4. Any other suggestion to lessen the
> query
> >>> >> time?
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >> Some suggestion：
> >>> >> > >>>> > > >> The log by class QueryStatisticsRecorder
> >>> give
> >>> >> me a
> >>> >> > >>>> good
> >>> >> > >>>> > > means
> >>> >> > >>>> > > >> to find the neck bottle, but not enough. There still
> >>> some
> >>> >> > metric
> >>> >> > >>>> I
> >>> >> > >>>> > think
> >>> >> > >>>> > > >> is
> >>> >> > >>>> > > >> very useful:
> >>> >> > >>>> > > >> 1. filter ratio. i.e.. not only
> result_size
> >>> but
> >>> >> > also
> >>> >> > >>>> the
> >>> >> > >>>> > > >> origin
> >>> >> > >>>> > > >> size so we could know how many data is filtered.
> >>> >> > >>>> > > >> 2. IO time. The scan_blocks_time is not
> >>> enough.
> >>> >> If
> >>> >> > >>>> it is
> >>> >> > >>>> > > high,
> >>> >> > >>>> > > >> we know somethings wrong, but not know what cause that
> >>> >> problem.
> >>> >> > >>>> The
> >>> >> > >>>> > real
> >>> >> > >>>> > > >> IO
> >>> >> > >>>> > > >> time for data is not be provided. As there may be
> >>> several
> >>> >> file
> >>> >> > >>>> for one
> >>> >> > >>>> > > >> partition, know the program slow is caused by datanode
> >>> or
> >>> >> > >>>> executor
> >>> >> > >>>> > > itself
> >>> >> > >>>> > > >> give us intuition to find the problem.
> >>> >> > >>>> > > >> 3. The TableBlockInfo for task. I log it
> by
> >>> >> myself
> >>> >> > >>>> when
> >>> >> > >>>> > > >> debugging. It tell me how many blocklets is locality.
> >>> The
> >>> >> spark
> >>> >> > >>>> web
> >>> >> > >>>> > > >> monitor
> >>> >> > >>>> > > >> just give a locality level, but may be only one
> >>> blocklet is
> >>> >> > >>>> locality.
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >> -Regards
> >>> >> > >>>> > > >> Kumar Vishal
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >> On Thu, Nov 10, 2016 at 8:55 PM, An Lan <
> >>> [hidden email]
> >>> >> >
> >>> >> > >>>> wrote:
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >> > Hi,
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > We are using carbondata to build our table and
> running
> >>> >> query
> >>> >> > in
> >>> >> > >>>> > > >> > CarbonContext. We have some performance problem
> during
> >>> >> > >>>> refining the
> >>> >> > >>>> > > >> system.
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > *Background*：
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > *cluster*： 100 executor，5
> >>> >> task/executor,
> >>> >> > >>>> 10G
> >>> >> > >>>> > > >> > memory/executor
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > *data*: 60+GB(per one
> >>> >> replica)
> >>> >> > as
> >>> >> > >>>> > carbon
> >>> >> > >>>> > > >> data
> >>> >> > >>>> > > >> > format, 600+MB/file * 100 file, 300+columns,
> >>> 300+million
> >>> >> rows
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > *sql example:*
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > select A,
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > sum(a),
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > sum(b),
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > sum(c),
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > …( extra 100
> >>> >> > aggregation
> >>> >> > >>>> like
> >>> >> > >>>> > > >> > sum(column))
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > from Table1
> >>> LATERAL
> >>> >> > VIEW
> >>> >> > >>>> > > >> > explode(split(Aarray, ‘*;*’)) ATable AS A
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > where A is not
> >>> null
> >>> >> and
> >>> >> > >>>> d >
> >>> >> > >>>> > > >> “ab:c-10”
> >>> >> > >>>> > > >> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and
> h=44
> >>> >> GROUP
> >>> >> > BY
> >>> >> > >>>> A
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > *target query time*: <10s
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > *current query time*: 15s ~ 25s
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > *scene:* OLAP system.
> <100
> >>> >> > queries
> >>> >> > >>>> every
> >>> >> > >>>> > > >> day.
> >>> >> > >>>> > > >> > Concurrency number is not high. Most time cpu is
> >>> idle, so
> >>> >> > this
> >>> >> > >>>> > service
> >>> >> > >>>> > > >> will
> >>> >> > >>>> > > >> > run with other program. The service will run for
> long
> >>> >> time.
> >>> >> > We
> >>> >> > >>>> could
> >>> >> > >>>> > > not
> >>> >> > >>>> > > >> > occupy a very large memory for every executor.
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > *refine*: I have build
> >>> index
> >>> >> and
> >>> >> > >>>> > > >> dictionary on
> >>> >> > >>>> > > >> > d, e, f, g, h and build dictionary on all other
> >>> >> aggregation
> >>> >> > >>>> > > >> columns(i.e. a,
> >>> >> > >>>> > > >> > b, c, …100+ columns). And make sure there is one
> >>> segment
> >>> >> for
> >>> >> > >>>> total
> >>> >> > >>>> > > >> data. I
> >>> >> > >>>> > > >> > have open the speculation(quantile=0.5,
> interval=250,
> >>> >> > >>>> > multiplier=1.2).
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > Time is mainly spent on first stage before
> shuffling.
> >>> As
> >>> >> 95%
> >>> >> > >>>> data
> >>> >> > >>>> > will
> >>> >> > >>>> > > >> be
> >>> >> > >>>> > > >> > filtered out, the shuffle process spend little time.
> >>> In
> >>> >> first
> >>> >> > >>>> stage,
> >>> >> > >>>> > > >> most
> >>> >> > >>>> > > >> > task complete in less than 10s. But there still be
> >>> near 50
> >>> >> > >>>> tasks
> >>> >> > >>>> > > longer
> >>> >> > >>>> > > >> > than 10s. Max task time in one query may be 12~16s.
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > *Problem:*
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > 1. GC problem. We suffer a 20%~30% GC time
> >>> for
> >>> >> some
> >>> >> > >>>> task in
> >>> >> > >>>> > > >> first
> >>> >> > >>>> > > >> > stage after a lot of parameter refinement. We now
> use
> >>> G1
> >>> >> GC
> >>> >> > in
> >>> >> > >>>> > java8.
> >>> >> > >>>> > > GC
> >>> >> > >>>> > > >> > time will double if use CMS. The main GC time is
> >>> spent on
> >>> >> > young
> >>> >> > >>>> > > >> generation
> >>> >> > >>>> > > >> > GC. Almost half memory of young generation will be
> >>> copy to
> >>> >> > old
> >>> >> > >>>> > > >> generation.
> >>> >> > >>>> > > >> > It seems lots of object has a long life than GC
> >>> period and
> >>> >> > the
> >>> >> > >>>> space
> >>> >> > >>>> > > is
> >>> >> > >>>> > > >> not
> >>> >> > >>>> > > >> > be reuse(as concurrent GC will release it later).
> >>> When we
> >>> >> > use a
> >>> >> > >>>> > large
> >>> >> > >>>> > > >> > Eden(>=1G for example), once GC time will be
> seconds.
> >>> If
> >>> >> set
> >>> >> > >>>> Eden
> >>> >> > >>>> > > >> > little(256M for example), once GC time will be
> >>> hundreds
> >>> >> > >>>> > milliseconds,
> >>> >> > >>>> > > >> but
> >>> >> > >>>> > > >> > more frequency and total is still seconds. Is there
> >>> any
> >>> >> way
> >>> >> > to
> >>> >> > >>>> > lessen
> >>> >> > >>>> > > >> the
> >>> >> > >>>> > > >> > GC time? (We don’t consider the first query and
> second
> >>> >> query
> >>> >> > >>>> in this
> >>> >> > >>>> > > >> case.)
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > 2. Performance refine problem. Row number
> >>> after
> >>> >> > being
> >>> >> > >>>> > > >> filtered is
> >>> >> > >>>> > > >> > not uniform. Some node maybe heavy. It spend more
> time
> >>> >> than
> >>> >> > >>>> other
> >>> >> > >>>> > > node.
> >>> >> > >>>> > > >> The
> >>> >> > >>>> > > >> > time of one task is 4s ~ 16s. Is any method to
> refine
> >>> it?
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > 3. Too long time for first and second
> >>> query. I
> >>> >> know
> >>> >> > >>>> > > dictionary
> >>> >> > >>>> > > >> > and some index need to be loaded for the first time.
> >>> But
> >>> >> > after
> >>> >> > >>>> I
> >>> >> > >>>> > > trying
> >>> >> > >>>> > > >> use
> >>> >> > >>>> > > >> > query below to preheat it, it still spend a lot of
> >>> time.
> >>> >> How
> >>> >> > >>>> could I
> >>> >> > >>>> > > >> > preheat the query correctly?
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > select Aarray, a, b, c… from Table1
> >>> where
> >>> >> > Aarray
> >>> >> > >>>> is
> >>> >> > >>>> > not
> >>> >> > >>>> > > >> null
> >>> >> > >>>> > > >> > and d = “sss” and e !=22 and f = 33 and g = 44 and h
> >>> = 55
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > 4. Any other suggestion to lessen the
> query
> >>> >> time?
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > Some suggestion：
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > The log by class QueryStatisticsRecorder
> >>> give
> >>> >> me
> >>> >> > a
> >>> >> > >>>> good
> >>> >> > >>>> > > >> means
> >>> >> > >>>> > > >> > to find the neck bottle, but not enough. There still
> >>> some
> >>> >> > >>>> metric I
> >>> >> > >>>> > > >> think is
> >>> >> > >>>> > > >> > very useful:
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > 1. filter ratio. i.e.. not only
> >>> result_size
> >>> >> but
> >>> >> > >>>> also the
> >>> >> > >>>> > > >> origin
> >>> >> > >>>> > > >> > size so we could know how many data is filtered.
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > 2. IO time. The scan_blocks_time is not
> >>> >> enough.
> >>> >> > If
> >>> >> > >>>> it is
> >>> >> > >>>> > > >> high,
> >>> >> > >>>> > > >> > we know somethings wrong, but not know what cause
> that
> >>> >> > >>>> problem. The
> >>> >> > >>>> > > >> real IO
> >>> >> > >>>> > > >> > time for data is not be provided. As there may be
> >>> several
> >>> >> > file
> >>> >> > >>>> for
> >>> >> > >>>> > one
> >>> >> > >>>> > > >> > partition, know the program slow is caused by
> >>> datanode or
> >>> >> > >>>> executor
> >>> >> > >>>> > > >> itself
> >>> >> > >>>> > > >> > give us intuition to find the problem.
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > 3. The TableBlockInfo for task. I log it
> >>> by
> >>> >> > myself
> >>> >> > >>>> when
> >>> >> > >>>> > > >> > debugging. It tell me how many blocklets is
> locality.
> >>> The
> >>> >> > >>>> spark web
> >>> >> > >>>> > > >> monitor
> >>> >> > >>>> > > >> > just give a locality level, but may be only one
> >>> blocklet
> >>> >> is
> >>> >> > >>>> > locality.
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > ---------------------
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > Anning Luo
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > *HULU*
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > Email: [hidden email]
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >> > [hidden email]
> >>> >> > >>>> > > >> >
> >>> >> > >>>> > > >>
> >>> >> > >>>> > > >
> >>> >> > >>>> > > >
> >>> >> > >>>> > >
> >>> >> > >>>> >
> >>> >> > >>>>
> >>> >> > >>>
> >>> >> > >>>
> >>> >> > >>
> >>> >> > >
> >>> >> >
> >>> >>
> >>> >
> >>> >
> >>>
> >>
> >>
> > driverlog3.txt
> > <https://drive.google.com/file/d/0B1XM6KeI1nB7QmZXd2d3ZlR2bnM/
> view?usp=drive_web>
> >
> > driverlog2.txt
> > <https://drive.google.com/file/d/0B1XM6KeI1nB7UF92RWVOWUdwZGM/
> view?usp=drive_web>
> >
> > driverlog1.txt
> > <https://drive.google.com/file/d/0B1XM6KeI1nB7WjhuNjZHZWZJZGM/
> view?usp=drive_web>
> >
> >
>

... [show rest of quote]

kumar vishal