Hi,
We are using carbondata to build our table and running query in CarbonContext. We have some performance problem during refining the system. Background: cluster: 100 executor,5 task/executor, 10G memory/executor data: 60+GB(per one replica) as carbon data format, 600+MB/file * 100 file, 300+columns, 300+million rows sql example: select A, sum(a), sum(b), sum(c), …( extra 100 aggregation like sum(column)) from Table1 LATERAL VIEW explode(split(Aarray, ‘;’)) ATable AS A where A is not null and d > “ab:c-10” and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A target query time: <10s current query time: 15s ~ 25s scene: OLAP system. <100 queries every day. Concurrency number is not high. Most time cpu is idle, so this service will run with other program. The service will run for long time. We could not occupy a very large memory for every executor. refine: I have build index and dictionary on d, e, f, g, h and build dictionary on all other aggregation columns(i.e. a, b, c, …100+ columns). And make sure there is one segment for total data. I have open the speculation(quantile=0.5, interval=250, multiplier=1.2). Time is mainly spent on first stage before shuffling. As 95% data will be filtered out, the shuffle process spend little time. In first stage, most task complete in less than 10s. But there still be near 50 tasks longer than 10s. Max task time for a query may be 12~16s. Problem: 1. GC problem. We suffer a 20%~30% GC time for some task in first stage after a lot of parameter refinement. We now use G1 GC in java8. GC time will double if use CMS. The main GC time is spent on young generation GC. Almost half memory of young generation will be copy to old generation. It seems lots of object has a long life than GC period and the space is not be reuse(as concurrent GC will release it later). When we use a large Eden(>=1G for example), once GC time will be seconds. If set Eden little(256M for example), once GC time will be hundreds milliseconds, but more frequency and total is still seconds. Is there any way to lessen the GC time? (We don’t consider the first query and second query in this case.) 2. Performance refine problem. Row number after being filtered is not uniform. Some node maybe heavy. It spend more time than other node. The time of one task is 4s ~ 16s. Is any method to refine it? 3. Too long time for first and second query. I know dictionary and some index need to be loaded for the first time. But after I trying use query below to preheat it, it still spend a lot of time. How could I preheat the query correctly? select Aarray, a, b, c… from Table1 where Aarray is not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55 4. Any other suggestion to lessen the query time? Some suggestion: The log by class QueryStatisticsRecorder give me a good means to find the neck bottle, but not enough. There still some metric I think is very useful: 1. filter ratio. i.e.. not only result_size but also the origin size so we could know how many data is filtered. 2. IO time. The scan_blocks_time is not enough. If it is high, we know somethings wrong, but not know what cause that problem. The real IO time for data is not be provided. As there may be several file for one partition, know the program slow is caused by datanode or executor itself give us intuition to find the problem. 3. The TableBlockInfo for task. I log it by myself when debugging. It tell me how many blocklets is locality. The spark web monitor just give a locality level, but may be only one blocklet is locality. |
Hi Anning Luo,
Can u please provide below details. 1.Create table ddl. 2.Number of node in you cluster setup. 3. Number of executors per node. 4. Query statistics. Please find my comments in bold. Problem: 1. GC problem. We suffer a 20%~30% GC time for some task in first stage after a lot of parameter refinement. We now use G1 GC in java8. GC time will double if use CMS. The main GC time is spent on young generation GC. Almost half memory of young generation will be copy to old generation. It seems lots of object has a long life than GC period and the space is not be reuse(as concurrent GC will release it later). When we use a large Eden(>=1G for example), once GC time will be seconds. If set Eden little(256M for example), once GC time will be hundreds milliseconds, but more frequency and total is still seconds. Is there any way to lessen the GC time? (We don’t consider the first query and second query in this case.) *How many node are present in your cluster setup?? If nodes are less please reduce the number of executors per node.* 2. Performance refine problem. Row number after being filtered is not uniform. Some node maybe heavy. It spend more time than other node. The time of one task is 4s ~ 16s. Is any method to refine it? 3. Too long time for first and second query. I know dictionary and some index need to be loaded for the first time. But after I trying use query below to preheat it, it still spend a lot of time. How could I preheat the query correctly? select Aarray, a, b, c… from Table1 where Aarray is not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55 *Currently we are working on first time query improvement. For now you can run select count(*) or count(column), so all the blocks get loaded and then you can run the actual query.* 4. Any other suggestion to lessen the query time? Some suggestion: The log by class QueryStatisticsRecorder give me a good means to find the neck bottle, but not enough. There still some metric I think is very useful: 1. filter ratio. i.e.. not only result_size but also the origin size so we could know how many data is filtered. 2. IO time. The scan_blocks_time is not enough. If it is high, we know somethings wrong, but not know what cause that problem. The real IO time for data is not be provided. As there may be several file for one partition, know the program slow is caused by datanode or executor itself give us intuition to find the problem. 3. The TableBlockInfo for task. I log it by myself when debugging. It tell me how many blocklets is locality. The spark web monitor just give a locality level, but may be only one blocklet is locality. -Regards Kumar Vishal On Thu, Nov 10, 2016 at 12:30 PM, Anning Luo <[hidden email]> wrote: > Hi, > > We are using carbondata to build our table and running query in > CarbonContext. We have some performance problem during refining the system. > > Background: > cluster: 100 executor,5 task/executor, 10G > memory/executor > data: 60+GB(per one replica) as carbon data > format, 600+MB/file * 100 file, 300+columns, 300+million rows > sql example: > select A, > sum(a), > sum(b), > sum(c), > …( extra 100 aggregation like > sum(column)) > from Table1 LATERAL VIEW explode(split(Aarray, ‘;’)) ATable AS A > where A is not null and d > “ab:c-10” and d < “h:0f3s” and e!=10 > and f=22 and g=33 and h=44 GROUP BY A > target query time: <10s > current query time: 15s ~ 25s > scene: OLAP system. <100 queries every day. > Concurrency number is not high. Most time cpu is idle, so this service will > run with other program. The service will run for long time. We could not > occupy a very large memory for every executor. > refine: I have build index and dictionary on > d, e, f, g, h and build dictionary on all other aggregation columns(i.e. a, > b, c, …100+ columns). And make sure there is one segment for total data. I > have open the speculation(quantile=0.5, interval=250, multiplier=1.2). > > Time is mainly spent on first stage before shuffling. As 95% data will be > filtered out, the shuffle process spend little time. In first stage, most > task complete in less than 10s. But there still be near 50 tasks longer > than 10s. Max task time for a query may be 12~16s. > > Problem: > 1. GC problem. We suffer a 20%~30% GC time for > some task in first stage after a lot of parameter refinement. We now use G1 > GC in java8. GC time will double if use CMS. The main GC time is spent on > young generation GC. Almost half memory of young generation will be copy to > old generation. It seems lots of object has a long life than GC period and > the space is not be reuse(as concurrent GC will release it later). When we > use a large Eden(>=1G for example), once GC time will be seconds. If set > Eden little(256M for example), once GC time will be hundreds milliseconds, > but more frequency and total is still seconds. Is there any way to lessen > the GC time? (We don’t consider the first query and second query in this > case.) > 2. Performance refine problem. Row number after > being filtered is not uniform. Some node maybe heavy. It spend more time > than other node. The time of one task is 4s ~ 16s. Is any method to refine > it? > 3. Too long time for first and second query. I > know dictionary and some index need to be loaded for the first time. But > after I trying use query below to preheat it, it still spend a lot of time. > How could I preheat the query correctly? > select Aarray, a, b, c… from Table1 where Aarray > is not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55 > 4. Any other suggestion to lessen the query time? > > Some suggestion: > The log by class QueryStatisticsRecorder give me a good means > to find the neck bottle, but not enough. There still some metric I think is > very useful: > 1. filter ratio. i.e.. not only result_size but also the > origin size so we could know how many data is filtered. > 2. IO time. The scan_blocks_time is not enough. If it is high, > we know somethings wrong, but not know what cause that problem. The real IO > time for data is not be provided. As there may be several file for one > partition, know the program slow is caused by datanode or executor itself > give us intuition to find the problem. > 3. The TableBlockInfo for task. I log it by myself when > debugging. It tell me how many blocklets is locality. The spark web monitor > just give a locality level, but may be only one blocklet is locality. >
kumar vishal
|
In reply to this post by Anning Luo
Hi,
We are using carbondata to build our table and running query in CarbonContext. We have some performance problem during refining the system. Background: cluster: 100 executor,5 task/executor, 10G memory/executor data: 60+GB(per one replica) as carbon data format, 600+MB/file * 100 file, 300+columns, 300+million rows sql example: select A, sum(a), sum(b), sum(c), …( extra 100 aggregation like sum(column)) from Table1 LATERAL VIEW explode(split(Aarray, ‘;’)) ATable AS A where A is not null and d > “ab:c-10” and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A target query time: <10s current query time: 15s ~ 25s scene: OLAP system. <100 queries every day. Concurrency number is not high. Most time cpu is idle, so this service will run with other program. The service will run for long time. We could not occupy a very large memory for every executor. refine: I have build index and dictionary on d, e, f, g, h and build dictionary on all other aggregation columns(i.e. a, b, c, …100+ columns). And make sure there is one segment for total data. I have open the speculation(quantile=0.5, interval=250, multiplier=1.2). Time is mainly spent on first stage before shuffling. As 95% data will be filtered out, the shuffle process spend little time. In first stage, most task complete in less than 10s. But there still be near 50 tasks longer than 10s. Max task time for a query may be 12~16s. Problem: 1. GC problem. We suffer a 20%~30% GC time for some task in first stage after a lot of parameter refinement. We now use G1 GC in java8. GC time will double if use CMS. The main GC time is spent on young generation GC. Almost half memory of young generation will be copy to old generation. It seems lots of object has a long life than GC period and the space is not be reuse(as concurrent GC will release it later). When we use a large Eden(>=1G for example), once GC time will be seconds. If set Eden little(256M for example), once GC time will be hundreds milliseconds, but more frequency and total is still seconds. Is there any way to lessen the GC time? (We don’t consider the first query and second query in this case.) 2. Performance refine problem. Row number after being filtered is not uniform. Some node maybe heavy. It spend more time than other node. The time of one task is 4s ~ 16s. Is any method to refine it? 3. Too long time for first and second query. I know dictionary and some index need to be loaded for the first time. But after I trying use query below to preheat it, it still spend a lot of time. How could I preheat the query correctly? select Aarray, a, b, c… from Table1 where Aarray is not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55 4. Any other suggestion to lessen the query time? Some suggestion: The log by class QueryStatisticsRecorder give me a good means to find the neck bottle, but not enough. There still some metric I think is very useful: 1. filter ratio. i.e.. not only result_size but also the origin size so we could know how many data is filtered. 2. IO time. The scan_blocks_time is not enough. If it is high, we know somethings wrong, but not know what cause that problem. The real IO time for data is not be provided. As there may be several file for one partition, know the program slow is caused by datanode or executor itself give us intuition to find the problem. 3. The TableBlockInfo for task. I log it by myself when debugging. It tell me how many blocklets is locality. The spark web monitor just give a locality level, but may be only one blocklet is locality. ----------------------------------------- Anning Luo HULU Email: [hidden email] [hidden email] Phone: (+86)15011500384 |
Free forum by Nabble | Edit this page |