[Background]
Currently, Carbondata do not have SQL command to support upsert [Movitation] Since we already have merge into dataset API, we can develop a MERGE INTO SQL API. Since the merge into command is a litter bit complex, it may need to develop a SQL parser with ANTLR to parse the SQL. MERGEINTO SQL COMMAND MERGE INTO [db_name.]target_table [AS target_alias] USING [db_name.]source_table [AS source_alias] ON <merge_condition> [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ] [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ] [ WHEN NOT MATCHED [ AND <condition> ] THEN <not_matched_action> ] MERGE INTO TAREGT USING SOURCE ON SOURCE.ID=TARGET.ID WHEN MATCHED THEN UPDATE SET TARGET.NAME = SOURCE.NAME WHEN NOT MATCHED THEN INSERT (TARGER.ID, TARGET.NAME, TARGET,AGE) VALUES ( SOURCE.ID, SOURCE.NAME, SOURCE.AGE) TARGET TABLE ID Name Age 1 Jan 23 2 KK 22 3 Joe 27 SOURCE TABLE ID Name Age 2 Steve 22 4 Mike 24 AFTER MERGE INTO COMMAND TARGET TABLE ID Name Age 1 Jan 23 2 Steve 22 3 Joe 27 4 Mike 24 In the first version of implement, MERGE INTO SQL command in Carbondata will support basic table merge condition. And it will support up to 2 MATCHED clause, and 1 NOT MATCHED clause. In the send version, it can support the Carbondata feature, such as segment, time travel version etc. CarbonDataMergeIntoSQl.docx <http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t492/CarbonDataMergeIntoSQl.docx> -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
+1,
Thanks for planning to implement this. Please define the limitations or scope in more detail for WHEN MATCHED and WHEN NOT MATCHED. For example, when NOT MATCHED, can UPDATE also supported? (I guess only insert is supported) Thanks, Ajantha On Thu, Nov 5, 2020 at 8:10 AM BrooksLi <[hidden email]> wrote: > [Background] > Currently, Carbondata do not have SQL command to support upsert > > [Movitation] > Since we already have merge into dataset API, we can develop a MERGE INTO > SQL API. Since the merge into command is a litter bit complex, it may need > to develop a SQL parser with ANTLR to parse the SQL. > > MERGEINTO SQL COMMAND > > MERGE INTO [db_name.]target_table [AS target_alias] > USING [db_name.]source_table [AS source_alias] > ON <merge_condition> > [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ] > [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ] > [ WHEN NOT MATCHED [ AND <condition> ] THEN <not_matched_action> ] > > MERGE INTO TAREGT > USING SOURCE > ON SOURCE.ID=TARGET.ID > WHEN MATCHED THEN > UPDATE SET TARGET.NAME = SOURCE.NAME > WHEN NOT MATCHED THEN > INSERT (TARGER.ID, TARGET.NAME, TARGET,AGE) VALUES ( SOURCE.ID, > SOURCE.NAME, > SOURCE.AGE) > > TARGET TABLE > ID Name Age > 1 Jan 23 > 2 KK 22 > 3 Joe 27 > > SOURCE TABLE > ID Name Age > 2 Steve 22 > 4 Mike 24 > > > AFTER MERGE INTO COMMAND > TARGET TABLE > ID Name Age > 1 Jan 23 > 2 Steve 22 > 3 Joe 27 > 4 Mike 24 > > In the first version of implement, MERGE INTO SQL command in Carbondata > will > support basic table merge condition. And it will support up to 2 MATCHED > clause, and 1 NOT MATCHED clause. > > In the send version, it can support the Carbondata feature, such as > segment, > time travel version etc. > > > CarbonDataMergeIntoSQl.docx > < > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t492/CarbonDataMergeIntoSQl.docx> > > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
In reply to this post by BrooksLi
Hi,
+1 Thanks for proposing the idea. Please consider the below points in design and coding please try to update the below points to design 1. when there are multiple whenMatched conditions what happens? it should be in order 2. validations like when matched can have either update or delete, this validation 3. when the condition is not present for when matched, then how to update or delete complete data, example can be given, how the sql will look like 4. how to update all columns of target table from source table and how the sql will be and validations can be there 5. validation for whenNot matched to have only insert 6. what happens when the columns does not match from source and target 7. how to insert complete data. Please try to mention all these in design documents, can add section for all the validation to take care. One section can be to mention the examples. By this way design will be very good and easy to understand. Thanks Regards, Akash R Nilugal -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi, Akash. Here are the answers to your questions. And I think we can have a
further investigation about merge into command in Carbondata. Q1. when there are multiple whenMatched conditions what happens? A1. The number of whenMatched and notMatched can be unlimited Since the Carbondata API for merge matched and unmatched clause are unlimited. However, the order of WHEN MATCHED and NOTMATCHED matters. The WHEN MATCHED clause should be placed before the NOTMATCHED. And once it matched the first matched/not matched clause, it will just begin to execute and ignore the others. Q2. validations like when matched can have either update or delete, this validation A2. Yes, each when matched can only have either update/ delete once. And we need to validate the SQL Text, it the SQL Text do not match the rule of Merge Into, it will just return an SQL Parser Error back to the users. Q3. when the condition is not present for when matched, then how to update or delete complete data, example can be given, how the sql will look like A3. When there is not when matched clause, this merge into command can only use to act as INSERT INTO command when it meets the not matched clause. When all WHENMATCHED Clauses are missed, it begins to check the WHENNOTMATCHED Clause. Q4. how to update all columns of the target table from the source table and how the SQL will be and validations can be there A4. For this problem, I think it may only be available to match all the corresponding columns manually. MERGE INTO events USING updates ON events.eventId = updates.eventId WHEN MATCHED THEN UPDATE SET events.data = updates.data, events.name = updates.name Q5. validation for whenNot matched to have only insert A5. In matched clause: it can only have UPDATE/ DELETE execution In not matched clause: it can only execute INSERT INTO Q6. what happens when the columns do not match from source and target A6. I think it will do nothing Q7. how to insert complete data. A7. When it do not match any match clause, it will run to the when not matched clause. I think there can be two solutions, on the solution is to insert * The SQL look like this MERGE INTO logs USING newDedupedLogs ON logs.uniqueId = newDedupedLogs.uniqueId WHEN NOT MATCHED THEN INSERT * And the other solution is to match the columns one by one. We can further discuss these rules and limits of MERGE INTO since the code is still being implemented. Thanks Brooks Li -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi,
Actually these all things i suggested to mention in design document and update. All these are QAs. for your answer: A3 -> The question is, when there is no condition present for when matched clause, then how to update all data, mention SQL example in design document Same for insert also, so you can have SQL like, for update all column - SET * to insert all - Insert * So actually my mail was not to get answers, to update in the document, So can you please add all these points in design and also should be added later in document md file also. Thanks Regards, Akash R Nilugal -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi Akash,
Thanks for the key problems you have asked about. Sure, I will update the design document later after further investigation and discussion with you guys. I think Q&A may be a good way to analyze the problems. The design document will update and post here today. Thanks, Brooks Li -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by BrooksLi
Hi all,
Here is Version 2.0 for Carbon Merge Into SQL API. The number and functions of when matched/ when not matched clause have been defined. And support * to update/ insert all columns. Please check the new Doc to see if there is anything that can be added. CarbonDataMergeIntoSQL_V2.docx <http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t492/CarbonDataMergeIntoSQL_V2.docx> Thanks, Brooks Li -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by BrooksLi
Hi, all.
The code implement of MERGE INTO SQL Command has been posted at https://github.com/apache/carbondata/pull/4032. Currently, the function is still simple and it will be implemented with your guy's review. Thanks Jianxi Li -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |