[DISCUSSION] Support MERGE INTO SQL API

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSSION] Support MERGE INTO SQL API

BrooksLi
[Background]
Currently, Carbondata do not have SQL command to support upsert

[Movitation]
Since we already have merge into dataset API, we can develop a MERGE INTO
SQL API. Since the merge into command is a litter bit complex, it may need
to develop a SQL parser with ANTLR to parse the SQL.

MERGEINTO SQL COMMAND

MERGE INTO [db_name.]target_table [AS target_alias]
USING [db_name.]source_table [AS source_alias]
ON <merge_condition>
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN NOT MATCHED [ AND <condition> ]  THEN <not_matched_action> ]

MERGE INTO TAREGT
USING SOURCE
ON SOURCE.ID=TARGET.ID
WHEN MATCHED THEN
UPDATE SET TARGET.NAME = SOURCE.NAME
WHEN NOT MATCHED THEN
INSERT (TARGER.ID, TARGET.NAME, TARGET,AGE) VALUES ( SOURCE.ID, SOURCE.NAME,
SOURCE.AGE)

TARGET TABLE
ID Name     Age
1 Jan      23
2 KK      22
3 Joe      27

SOURCE TABLE
ID Name Age
2 Steve 22
4 Mike        24


AFTER MERGE INTO COMMAND
TARGET TABLE
ID Name Age
1 Jan        23
2 Steve 22
3 Joe        27
4 Mike        24

In the first version of implement, MERGE INTO SQL command in Carbondata will
support basic table merge condition. And it will support up to 2 MATCHED
clause, and 1 NOT MATCHED clause.

In the send version, it can support the Carbondata feature, such as segment,
time travel version etc.


CarbonDataMergeIntoSQl.docx
<http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t492/CarbonDataMergeIntoSQl.docx>  



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support MERGE INTO SQL API

Ajantha Bhat
+1,

Thanks for planning to implement this.

Please define the limitations or scope in more detail for WHEN MATCHED  and
WHEN NOT MATCHED.
For example, when  NOT MATCHED, can UPDATE also supported? (I guess only
insert is supported)

Thanks,
Ajantha

On Thu, Nov 5, 2020 at 8:10 AM BrooksLi <[hidden email]> wrote:

> [Background]
> Currently, Carbondata do not have SQL command to support upsert
>
> [Movitation]
> Since we already have merge into dataset API, we can develop a MERGE INTO
> SQL API. Since the merge into command is a litter bit complex, it may need
> to develop a SQL parser with ANTLR to parse the SQL.
>
> MERGEINTO SQL COMMAND
>
> MERGE INTO [db_name.]target_table [AS target_alias]
> USING [db_name.]source_table [AS source_alias]
> ON <merge_condition>
> [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
> [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
> [ WHEN NOT MATCHED [ AND <condition> ]  THEN <not_matched_action> ]
>
> MERGE INTO TAREGT
> USING SOURCE
> ON SOURCE.ID=TARGET.ID
> WHEN MATCHED THEN
> UPDATE SET TARGET.NAME = SOURCE.NAME
> WHEN NOT MATCHED THEN
> INSERT (TARGER.ID, TARGET.NAME, TARGET,AGE) VALUES ( SOURCE.ID,
> SOURCE.NAME,
> SOURCE.AGE)
>
> TARGET TABLE
> ID      Name     Age
> 1       Jan           23
> 2       KK            22
> 3       Joe           27
>
> SOURCE TABLE
> ID      Name    Age
> 2       Steve   22
> 4       Mike            24
>
>
> AFTER MERGE INTO COMMAND
> TARGET TABLE
> ID      Name    Age
> 1       Jan             23
> 2       Steve   22
> 3       Joe             27
> 4       Mike            24
>
> In the first version of implement, MERGE INTO SQL command in Carbondata
> will
> support basic table merge condition. And it will support up to 2 MATCHED
> clause, and 1 NOT MATCHED clause.
>
> In the send version, it can support the Carbondata feature, such as
> segment,
> time travel version etc.
>
>
> CarbonDataMergeIntoSQl.docx
> <
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t492/CarbonDataMergeIntoSQl.docx>
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support MERGE INTO SQL API

akashrn5
In reply to this post by BrooksLi
Hi,

+1
Thanks for proposing the idea.

Please consider the below points in design and coding

please try to update the below points to design
1. when there are multiple whenMatched conditions what happens? it should be
in order
2. validations like when matched can have either update or delete, this
validation
3. when the condition is not present for when matched, then how to update or
delete complete data,
example can be given, how the sql will look like
4. how to update all columns of target table from source table and how the
sql will be and validations can be there
5. validation for whenNot matched to have only insert
6. what happens when the columns does not match from source and target
7. how to insert complete data.

Please try to mention all these in design documents, can add section for all
the validation to take care.
One section can be to mention the examples.

By this way design will be very good and easy to understand.

Thanks

Regards,
Akash R Nilugal



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support MERGE INTO SQL API

BrooksLi
Hi, Akash. Here are the answers to your questions. And I think we can have a
further investigation about merge into command in Carbondata.
Q1. when there are multiple whenMatched conditions what happens?
A1.
The number of whenMatched and notMatched can be unlimited Since the
Carbondata API  for merge matched and unmatched clause are unlimited.
However, the order of WHEN MATCHED and NOTMATCHED matters. The WHEN MATCHED
clause should be placed before the NOTMATCHED.
And once it matched the first matched/not matched clause, it will just begin
to execute and ignore the others.

Q2. validations like when matched can have either update or delete, this
validation

A2.
Yes, each when matched can only have either update/ delete once.
And we need to validate the SQL Text, it the SQL Text do not match the rule
of Merge Into, it will just return an SQL Parser Error back to the users.


Q3. when the condition is not present for when matched, then how to update
or
delete complete data,
example can be given, how the sql will look like

A3.
When there is not when matched clause, this merge into command can only use
to act as INSERT INTO command when it meets the not matched clause.
When all WHENMATCHED Clauses are missed, it begins to check the
WHENNOTMATCHED Clause.


Q4. how to update all columns of the target table from the source table and
how the
SQL will be and validations can be there


A4.
For this problem, I think it may only be available to match all the
corresponding columns manually.
MERGE INTO events
USING updates
ON events.eventId = updates.eventId
WHEN MATCHED THEN
  UPDATE SET events.data = updates.data, events.name = updates.name



Q5. validation for whenNot matched to have only insert

A5.
In matched clause: it can only have UPDATE/ DELETE execution
In not matched clause: it can only execute INSERT INTO

Q6. what happens when the columns do not match from source and target

A6.
I think it will do nothing

Q7. how to insert complete data.
A7.
When it do not match any match clause, it will run to the when not matched
clause.
I think there can be two solutions, on the solution is to insert *

The SQL look like this
MERGE INTO logs
USING newDedupedLogs
ON logs.uniqueId = newDedupedLogs.uniqueId
WHEN NOT MATCHED
  THEN INSERT *
And the other solution is to match the columns one by one.


We can further discuss these rules and limits of MERGE INTO since the code
is still being implemented.

Thanks
Brooks Li




--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support MERGE INTO SQL API

akashrn5
Hi,

Actually these all things i suggested to mention in design document and
update. All these are QAs.

for your answer:

A3 ->
The question is, when there is no condition present for when matched clause,
then how to update all data, mention SQL example in design document

Same for insert also, so you can have SQL like,
for update all column -  SET *
to insert all - Insert *

So actually my mail was not to get answers, to update in the document, So
can you please
add all these points in design and also should be added later in document md
file also.

Thanks

Regards,
Akash R Nilugal




--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support MERGE INTO SQL API

BrooksLi
Hi Akash,

Thanks for the key problems you have asked about.

Sure, I will update the design document later after further investigation
and discussion with you guys.
I think Q&A may be a good way to analyze the problems.

The design document will update and post here today.

Thanks,
Brooks Li



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support MERGE INTO SQL API

BrooksLi
In reply to this post by BrooksLi
Hi all,

Here is Version 2.0 for Carbon Merge Into SQL API.
The number and functions of when matched/ when not matched clause have been
defined.
And support * to update/ insert all columns.

Please check the new Doc to see if there is anything that can be added.
CarbonDataMergeIntoSQL_V2.docx
<http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t492/CarbonDataMergeIntoSQL_V2.docx>  

Thanks,
Brooks Li



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support MERGE INTO SQL API

BrooksLi
In reply to this post by BrooksLi
Hi, all.
The code implement of MERGE INTO SQL Command has been posted at
https://github.com/apache/carbondata/pull/4032.

Currently, the function is still simple and it will be implemented with your
guy's review.

Thanks
Jianxi Li



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/