Posted by
haomarch on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Improving-show-segment-info-tp91874p92074.html
1. show partition enhanced
I think it's more useful to see the data volume of each partiton,
futhermore, the data volume of every day will an important metric which
cann't be count now.
Sum up, if the partition column is "day, hour"
Shall we focus on these two problem:
Q1: How can we find the data volumn of each hour? ------we can aggreagete
the data volumne of each segment belong to this parition, it is easy.
Q2: How can we find the data volumn of each day? ------ May be add a option
in "show partiitons", like "show paritions groupby DAY"?
2. show load delay
Shall we add an option (dryRun = true) in the "insert stage" command, to
output the statitis of stages aren't loaded.
the output can be
| dtm-20200219/hh=13 | incompletely load, there are still 200 stages
waiting for loading
| dtm-20200219/hh=14 | completely load
| dtm-20200219/hh=15 | completely load
| dtm-20200219/hh=16 | completely load
| dtm?20200219/hh=17 | incompletely load, there are still 1800 stages
waiting for loading
2. show segment enhanced
Only coarse-grained statistics maybe better. Shall we just show the
paritition each segment belonging to, rather than outputing the
min(collect_time).
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/