Hi, dev
Currently, I am thinking about the function of show segments. We can see segments of carbon table by executing this command, but it can only return segmentId, status, load start time and load end time, and all this information is from tablestatus, which I think it may be not enough for users to know better about the situation of each segment, so now I want to add two parameters, one is the number of carbon data file under segment folder, another is the number of carbon index file under segment folder. Any suggestion about my idea ? Welcome to communicate. Regards. Chenerlu. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Nice.
what about the update status of the segment? Maybe someone are interested in the last modify time of a segment. On 09/16/2017 17:14, Erlu Chen wrote: Hi, dev Currently, I am thinking about the function of show segments. We can see segments of carbon table by executing this command, but it can only return segmentId, status, load start time and load end time, and all this information is from tablestatus, which I think it may be not enough for users to know better about the situation of each segment, so now I want to add two parameters, one is the number of carbon data file under segment folder, another is the number of carbon index file under segment folder. Any suggestion about my idea ? Welcome to communicate. Regards. Chenerlu. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
What is the use case? When user would be interested in knowing number of
files? On Sat, 16 Sep 2017 at 3:12 PM, xuchuanyin <[hidden email]> wrote: > Nice. > > what about the update status of the segment? Maybe someone are interested > in the last modify time of a segment. > > > > > > On 09/16/2017 17:14, Erlu Chen wrote: > Hi, dev > > Currently, I am thinking about the function of show segments. We can see > segments of carbon table by executing this command, but it can only return > segmentId, status, load start time and load end time, and all this > information is from tablestatus, which I think it may be not enough for > users to know better about the situation of each segment, so now I want to > add two parameters, one is the number of carbon data file under segment > folder, another is the number of carbon index file under segment folder. > > Any suggestion about my idea ? > > Welcome to communicate. > > Regards. > Chenerlu. > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
I think it is good to have this feature, it may help user to decide whether manual compaction is needed.
Instead of outputting number of carbon index file per segment, I suggest output data size of the segment is more helpful. Regards, Jacky > 在 2017年9月20日,上午11:46,Raghunandan S <[hidden email]> 写道: > > What is the use case? When user would be interested in knowing number of > files? > On Sat, 16 Sep 2017 at 3:12 PM, xuchuanyin <[hidden email]> wrote: > >> Nice. >> >> what about the update status of the segment? Maybe someone are interested >> in the last modify time of a segment. >> >> >> >> >> >> On 09/16/2017 17:14, Erlu Chen wrote: >> Hi, dev >> >> Currently, I am thinking about the function of show segments. We can see >> segments of carbon table by executing this command, but it can only return >> segmentId, status, load start time and load end time, and all this >> information is from tablestatus, which I think it may be not enough for >> users to know better about the situation of each segment, so now I want to >> add two parameters, one is the number of carbon data file under segment >> folder, another is the number of carbon index file under segment folder. >> >> Any suggestion about my idea ? >> >> Welcome to communicate. >> >> Regards. >> Chenerlu. >> >> >> >> -- >> Sent from: >> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >> |
I agree with Jacky.
I think enhanced segment metadata will help us to understand the table. I suggest the following properties for segment metadata: 1. total data file size 2. total index file size 3. data file count 4. index file count 5. last modified time (last update time) Through these information, we can answer the following questions. 1. Is there small file issue? Whether table require compaction or not, which type should be used? 2. Whether index files is too many or not? we will can estimate the total size of index in memory whether it is big or small for driver memory configuration. 3. Whether some segment has too many files? Maybe it is useful to locate some performance issue. ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
Hi,
I agree with Jacky and David. But it is suggested to keep current 'show segments' command without any change and provide only brief information about segments. Add new extended command like `extended show segments` to provide more information which is required for power user. Regards, only Ravindra. On 21 September 2017 at 09:03, David CaiQiang <[hidden email]> wrote: > I agree with Jacky. > > I think enhanced segment metadata will help us to understand the table. > > I suggest the following properties for segment metadata: > 1. total data file size > 2. total index file size > 3. data file count > 4. index file count > 5. last modified time (last update time) > > Through these information, we can answer the following questions. > 1. Is there small file issue? Whether table require compaction or not, > which > type should be used? > 2. Whether index files is too many or not? we will can estimate the total > size of index in memory whether it is big or small for driver memory > configuration. > 3. Whether some segment has too many files? Maybe it is useful to locate > some performance issue. > > > > ----- > Best Regards > David Cai > -- > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556. > n5.nabble.com/ > -- Thanks & Regards, Ravi |
If adding a new statement, I suggest to learn from hive:
desc formatted table_name; VS desc table_name; Show segment... VS Show formatted segment... On 09/21/2017 14:02, Ravindra Pesala wrote: Hi, I agree with Jacky and David. But it is suggested to keep current 'show segments' command without any change and provide only brief information about segments. Add new extended command like `extended show segments` to provide more information which is required for power user. Regards, only Ravindra. On 21 September 2017 at 09:03, David CaiQiang <[hidden email]> wrote: > I agree with Jacky. > > I think enhanced segment metadata will help us to understand the table. > > I suggest the following properties for segment metadata: > 1. total data file size > 2. total index file size > 3. data file count > 4. index file count > 5. last modified time (last update time) > > Through these information, we can answer the following questions. > 1. Is there small file issue? Whether table require compaction or not, > which > type should be used? > 2. Whether index files is too many or not? we will can estimate the total > size of index in memory whether it is big or small for driver memory > configuration. > 3. Whether some segment has too many files? Maybe it is useful to locate > some performance issue. > > > > ----- > Best Regards > David Cai > -- > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556. > n5.nabble.com/ > -- Thanks & Regards, Ravi |
In reply to this post by ravipesala
Yeah. agree with ravi.
We can keep both "Show segments" and "Show extended segment" . @xuchuanyin, as i know currently the result of show segment is formatted. Regards. Chenerlu. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by ravipesala
Yeah. agree with ravi.
We can keep both "Show segments" and "Show extended segment" . @xuchuanyin, as i know currently the result of show segment is formatted. Regards. Chenerlu. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |