Hi all.!
Currently, with the help of show segment command, we can get the list of segments with details like ID, Status, Load start time, Load Time taken, partition, Data size, Index Size. And with the help of load start time, we can know the segment id for a particular load but in the case of concurrent load, it will be confusing to know the segment id for the specific load as load start time can be the same or nearby. To come out with this problem we are planning to show the segment id with the number of successful entries in the segment when carbondata load is successful. We can include some other details also if required after the conclusion. With help of this, we can know the segment id corresponding to a particular load and can be queried easily on that specific segment. Note: This scenario is valid for* insert into *query also. Please let me know your input about the same. Thanks, Nihal kumar ojha -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi Nihal
It's a good idea. We can also show the following information along with segment number. 1. Segment size 2. Number of files in the segment 3. Cache size and location(index server in case of prepriming) Can give the same information for add segment as well. Others can give input as well. Thanks Vikram On Wed, 13 Jan 2021, 12:09 pm Nihal, <[hidden email]> wrote: > Hi all.! > > Currently, with the help of show segment command, we can get the list of > segments with details like ID, Status, Load start time, > Load Time taken, partition, Data size, Index Size. And with the help of > load > start time, we can know the segment id for a > particular load but in the case of concurrent load, it will be confusing to > know the segment id for the specific > load as load start time can be the same or nearby. > > To come out with this problem we are planning to show the segment id with > the number of successful > entries in the segment when carbondata load is successful. We can include > some other details also if required after the conclusion. > With help of this, we can know the segment id corresponding to a particular > load and can be queried easily on that specific segment. > > Note: This scenario is valid for* insert into *query also. > > Please let me know your input about the same. > > Thanks, > Nihal kumar ojha > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hi Nihal, my suggestion as following,
1. contain the normal output of the show segment command 2. add more information for loading, like numFiles, numRows, rawDataSize (maybe show segment need also, take care of CDC which needs to update this information) ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
In reply to this post by Nihal
Hi Nihal,
The problem statement is not so clear, basically what is the use case, or in which scenario thee problem is faced. Because we need to get the result from the success segments itself. So please elaborate a little bit about the problem. Also, if you want to include more details, do not include in default show segments, may be can include in show segments with query, which likun had implemented. But this we can decide once its clear. Also, @vikram showing cache here is not a good idea, as we already have a command for that. If you are planning for segments wise, we can improve the existing cache specific commands, lets not include here. Thanks, Regards, Akash -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi Nihal,
In concurrent scenario we cannot map which load command has been loaded as which segment id. It is good to show the summary at the end of command. I agree with david suggestion. Along with load and insert, if possible we should give summary for update, delete and merge also (which we may start supporting concurrent operations in near future) Thanks, Ajantha On Mon, 18 Jan, 2021, 9:49 am akashrn5, <[hidden email]> wrote: > Hi Nihal, > > The problem statement is not so clear, basically what is the use case, or > in > which scenario thee problem is faced. Because we need to get the result > from > the success segments itself. So please elaborate a little bit about the > problem. > > Also, if you want to include more details, do not include in default show > segments, may be can include in show segments with query, which likun had > implemented. But this we can decide once its clear. > > Also, @vikram showing cache here is not a good idea, as we already have a > command for that. If you are planning for segments wise, we can improve the > existing cache specific commands, lets not include here. > > Thanks, > > Regards, > Akash > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hi Nihal,
I also feel, it is good to display only the segment Id at the end of command, similar to Update Command, which returns number of updated rows. No need to add other details (It can be enhanced in Show segments command if needed). Regards, Indhumathi M On Mon, Jan 18, 2021 at 9:54 AM Ajantha Bhat <[hidden email]> wrote: > Hi Nihal, > In concurrent scenario we cannot map which load command has been loaded as > which segment id. > It is good to show the summary at the end of command. > > > I agree with david suggestion. > Along with load and insert, if possible we should give summary for update, > delete and merge also (which we may start supporting concurrent operations > in near future) > > > Thanks, > Ajantha > > On Mon, 18 Jan, 2021, 9:49 am akashrn5, <[hidden email]> wrote: > > > Hi Nihal, > > > > The problem statement is not so clear, basically what is the use case, or > > in > > which scenario thee problem is faced. Because we need to get the result > > from > > the success segments itself. So please elaborate a little bit about the > > problem. > > > > Also, if you want to include more details, do not include in default show > > segments, may be can include in show segments with query, which likun had > > implemented. But this we can decide once its clear. > > > > Also, @vikram showing cache here is not a good idea, as we already have a > > command for that. If you are planning for segments wise, we can improve > the > > existing cache specific commands, lets not include here. > > > > Thanks, > > > > Regards, > > Akash > > > > > > > > -- > > Sent from: > > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > > > |
In reply to this post by Nihal
Hi all,
I agree to add some extra information after load/insert success. Shown information should be only accessible during this load, other information which can get at any time, no need to be shown on load return(we can add in show segments command because we can run show segments command at any time). I can use update command as example: update command will return "how many row updated this time". This information we can never get after this update, so this information is important. So for load and insert command, my suggestion is following information: 1) segment id(of course) 2) how many row loaded/inserted this time(include the bad rows handled by bad_record_action) 3) how many bad records(this information we can never get after loading/inserting) 4) bad_record_location(this is also passed in load option, so maybe no need because users set this themselves) -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by Nihal
Thank you all for your valuable inputs.
As per the suggestion and discussion, we have concluded to show only segment Id as summary when load or insert command will be successful. Regards, Nihal kumar ojha -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by Nihal
Hi,
let's continue to discuss about this. When auto merge is enable, should we return the segment id before or after compaction? My opinion is we should return the segment id before compaction because: 1. users will focus on his load operation, the merge operation is in backend and the users may not feel it; 2. return segment id after compaction is impossible based on the code now, because the load and the auto merge are asynchronous. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
hi,
i think still the auto compaction after load is not async, plan is there to make it async. But according to me, we should give back current segment ID and if its merged to some segment we should say that , "X" is the segment ID loaded and its been merged to "Y" segment, so that user can take decision whether to query that or not. Because if we just give the current segment which will be in compacted state and user blindly queries it and also if there is any concurrent clean files, then operations will fail. others can give their opinion. Regards, Akash -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by areyouokfreejoe
Hi,
I think after load, only return the segment id which data is loaded to is enough no matter auto load merge is enable or not. I will add one more reason apart from @areyouokfreejoe metioned: 1. Because user alredy cares about each load, so mostly in their application logic, auto load merge is disabled, user will hanlde compaction by themselves. Auto load merge only base on segment no., not base on any business relation between the segments. So if they enable auto load merge, several segments which has no any relation just the segment_id is close will be compacted. After this kind of compaction, all the information in the segment before compaction will be lost, this is not what user wants. If any load is special, in order to not lost any information after compaction, this load should only merge with the segment which has the same special point which is only known by the application, carbon currently has no place to store this information. So only user can control which segments will be compacted by trigger custom compaction with the segment ids which those segments have the same special point. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
+1, showing segment id should be enough as other information can be
gathered by other means. On Thu, Feb 18, 2021 at 1:31 PM Yahui Liu <[hidden email]> wrote: > Hi, > > I think after load, only return the segment id which data is loaded to is > enough no matter auto load merge is enable or not. I will add one more > reason apart from @areyouokfreejoe metioned: > 1. Because user alredy cares about each load, so mostly in their > application > logic, auto load merge is disabled, user will hanlde compaction by > themselves. Auto load merge only base on segment no., not base on any > business relation between the segments. So if they enable auto load merge, > several segments which has no any relation just the segment_id is close > will > be compacted. After this kind of compaction, all the information in the > segment before compaction will be lost, this is not what user wants. If any > load is special, in order to not lost any information after compaction, > this > load should only merge with the segment which has the same special point > which is only known by the application, carbon currently has no place to > store this information. So only user can control which segments will be > compacted by trigger custom compaction with the segment ids which those > segments have the same special point. > > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
+1, Agree with kunal, to show segment ID for current load.
Regards, Indhumathi M -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by Nihal
Hi,
+1, Considering others opinions, just segment ID can be enough and users should take care to check the status of it after load to decide whether to query or go ahead with any other operation on that segment. This makes code also simple and not induce any bugs and test scope will also be very limited. Regards, Akash -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by Nihal
+1 Good idea. Agree with you. Regards, Venu -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
+1 with Kunal’s idea
Vikram |
In reply to this post by Nihal
Hi,
+1, Agree with Kunal's idea that shows segment ID for successful load/insert. Thanks & Regards Mahesh Raju Somalaraju On Wed, Jan 13, 2021 at 12:09 PM Nihal <[hidden email]> wrote: > Hi all.! > > Currently, with the help of show segment command, we can get the list of > segments with details like ID, Status, Load start time, > Load Time taken, partition, Data size, Index Size. And with the help of > load > start time, we can know the segment id for a > particular load but in the case of concurrent load, it will be confusing to > know the segment id for the specific > load as load start time can be the same or nearby. > > To come out with this problem we are planning to show the segment id with > the number of successful > entries in the segment when carbondata load is successful. We can include > some other details also if required after the conclusion. > With help of this, we can know the segment id corresponding to a particular > load and can be queried easily on that specific segment. > > Note: This scenario is valid for* insert into *query also. > > Please let me know your input about the same. > > Thanks, > Nihal kumar ojha > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Free forum by Nabble | Edit this page |