[Problem]
One table has 40K segments, now start to create SI on one column, it took 1 week and finished 39999 segments, but during creating on the last segment, application crashed abnormally. Then checking the status of SI table, there is no success segment. Start again to create SI, it will start from the beginning(segment id = 0), all those SI segments which already loaded during last creation need to reload again, which it is waste of time. [Expectation] SI creation can succeed partitially, next time creation can start from the point where last creation failed at. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi,
yes, as you mentioned this is a major drawback in the current SI flow. This problem exists because, when we get the set of segments to load, we start an executor service and give all the segment list, after .get we make the status success at once. So we need to rewrite this code to make it like batch wise and avoid the problem. Regards, Akash R -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi,
Yes, it seems to be a drawback in the SI creation command. As Akash pointed out, instead of we trying to make status for all the segments at once we can do 2 things: 1. Load in batches(similar to what akash mentioned) and in case of some failure just stop loading and do not fail the SI creation command, so that the user can use reindex command to repair the remaining segments or can trigger repair in next consecutive loads in case of any failures. 2. Provide a way to only load some user defined number of segments in the SI instead if loading all at once. In this case, let's say the user wants to create a SI table with 40000 segments. He can just create a table with some 500 or 1000 segments initially. The user can then fire reindex command to load the remaining segments or can repair the remaining segments using load command and can repair in batches. Others can give their input as well. Regards Vikram On Tue, Mar 2, 2021 at 4:00 PM akashrn5 <[hidden email]> wrote: > Hi, > > yes, as you mentioned this is a major drawback in the current SI flow. This > problem exists because, when we get the set of segments to load, we start > an > executor service and give all the segment list, after .get we make the > status success at once. > > So we need to rewrite this code to make it like batch wise and avoid the > problem. > > > Regards, > Akash R > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
In reply to this post by Yahui Liu
Hi,
+1 for the feature. we can save a lot of load time for the SI table. doubt: 1) how we are going to handle the query for failed segments? need to prune from maintainable directly? Thanks & Regards Mahesh Raju Somalaraju On Fri, Feb 19, 2021 at 3:16 PM Yahui Liu <[hidden email]> wrote: > [Problem] > One table has 40K segments, now start to create SI on one column, it took 1 > week and finished 39999 segments, but during creating on the last segment, > application crashed abnormally. Then checking the status of SI table, there > is no success segment. Start again to create SI, it will start from the > beginning(segment id = 0), all those SI segments which already loaded > during > last creation need to reload again, which it is waste of time. > > [Expectation] > SI creation can succeed partitially, next time creation can start from the > point where last creation failed at. > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Free forum by Nabble | Edit this page |