Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] Pickupolddriver opened a new pull request #4032: WIP: Add MERGE INTO SQL Command support to carbondata

Classic

List

70 messages Options

Options

1234

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

CarbonDataQA2 commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-752879856

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3508/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-752880109

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5269/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#discussion_r551128429

##########
File path: docs/scd-and-cdc-guide.md
##########
@@ -55,8 +55,7 @@ Below is the detailed description of the `merge` API operation.
* `whenNotMatched` clause can have only the `insertExpr` action. The new row is generated based on the specified column and corresponding expressions. Users do not need to specify all the columns in the target table. For unspecified target columns, NULL is inserted.
* `whenNotMatchedAndExistsOnlyOnTarget` clause is executed when row does not match source and exists only in target. This clause can have only delete action.

-**NOTE:** SQL syntax for merge is not yet supported.
-
##### Example code to implement cdc/scd scenario

-Please refer example class [MergeTestCase](https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala) to understand and implement scd and cdc scenarios.
+Please refer example class [MergeTestCase](https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala) to understand and implement scd and cdc scenarios using api.
+Please refer example class [DataMergeIntoExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataMergeIntoExample.scala) to understand and implement scd and cdc scenarios using sql.

Review comment:
like line number 35 merge API syntax and operation semantics, please add detail for this

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#discussion_r551129927

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/merge/interfaces.scala
##########
@@ -71,9 +72,11 @@ case class WhenNotMatchedAndExistsOnlyOnTarget(expression: Option[Column] = None
override def getExp: Option[Column] = expression
}

-case class UpdateAction(updateMap: Map[Column, Column]) extends MergeAction
+case class UpdateAction(var updateMap: Map[Column, Column], isStar: Boolean = false)

Review comment:
wat does the `isStar` means here ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

ajantha-bhat commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-753788930

@Zhangshunyu , @Pickupolddriver : The document need to enchanced some more, I gave comment. And simplifying g4 file you can check with David again. Please recheck and once all comments are handled reply here. @QiangCai can merge.

@QiangCai , Please use co-authored by while merging, as I can see 2 authors.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

Zhangshunyu commented on a change in pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#discussion_r551150608

##########
File path: docs/scd-and-cdc-guide.md
##########
@@ -55,8 +55,7 @@ Below is the detailed description of the `merge` API operation.
* `whenNotMatched` clause can have only the `insertExpr` action. The new row is generated based on the specified column and corresponding expressions. Users do not need to specify all the columns in the target table. For unspecified target columns, NULL is inserted.
* `whenNotMatchedAndExistsOnlyOnTarget` clause is executed when row does not match source and exists only in target. This clause can have only delete action.

-**NOTE:** SQL syntax for merge is not yet supported.
-
##### Example code to implement cdc/scd scenario

-Please refer example class [MergeTestCase](https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala) to understand and implement scd and cdc scenarios.
+Please refer example class [MergeTestCase](https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala) to understand and implement scd and cdc scenarios using api.
+Please refer example class [DataMergeIntoExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataMergeIntoExample.scala) to understand and implement scd and cdc scenarios using sql.

Review comment:
@ajantha-bhat OK, added

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

Zhangshunyu commented on a change in pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#discussion_r551150842

##########
File path: integration/spark/pom.xml
##########
@@ -264,6 +269,18 @@
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
+ <dependency>

Review comment:
@QiangCai handled

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

Zhangshunyu commented on a change in pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#discussion_r551151196

##########
File path: integration/spark/pom.xml
##########
@@ -528,6 +545,22 @@
</execution>
</executions>
</plugin>
+ <plugin>
+ <groupId>org.antlr</groupId>
+ <artifactId>antlr4-maven-plugin</artifactId>
+ <executions>
+ <execution>
+ <goals>
+ <goal>antlr4</goal>
+ </goals>
+ </execution>
+ </executions>
+ <configuration>
+ <visitor>true</visitor>
+ <sourceDirectory>../spark/src/main/antlr4</sourceDirectory>

Review comment:
@QiangCai these files gen under spark module

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

Zhangshunyu commented on a change in pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#discussion_r551151299

##########
File path: integration/spark/src/main/antlr4/org/apache/spark/sql/parser/CarbonSqlBase.g4
##########
@@ -0,0 +1,1842 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * This file is an adaptation of Presto's presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/CarbonSqlBase.g4 grammar.

Review comment:
@QiangCai handled

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

Zhangshunyu commented on a change in pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#discussion_r551151350

##########
File path: integration/spark/src/main/java/org/apache/spark/sql/CarbonAntlrSqlVisitor.java
##########
@@ -0,0 +1,353 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.spark.sql.catalyst.expressions.Expression;
+import org.apache.spark.sql.catalyst.parser.ParseException;
+import org.apache.spark.sql.catalyst.parser.ParserInterface;
+import org.apache.spark.sql.execution.command.mutation.merge.DeleteAction;
+import org.apache.spark.sql.execution.command.mutation.merge.InsertAction;
+import org.apache.spark.sql.execution.command.mutation.merge.MergeAction;
+import org.apache.spark.sql.execution.command.mutation.merge.UpdateAction;
+import org.apache.spark.sql.merge.model.CarbonJoinExpression;
+import org.apache.spark.sql.merge.model.CarbonMergeIntoModel;
+import org.apache.spark.sql.merge.model.ColumnModel;
+import org.apache.spark.sql.merge.model.TableModel;
+import org.apache.spark.sql.parser.CarbonSqlBaseBaseVisitor;
+import org.apache.spark.sql.parser.CarbonSqlBaseParser;
+import org.apache.spark.util.SparkUtil;
+
+public class CarbonAntlrSqlVisitor extends CarbonSqlBaseBaseVisitor {
+
+ private final ParserInterface sparkParser;
+
+ public CarbonAntlrSqlVisitor(ParserInterface sparkParser) {
+ this.sparkParser = sparkParser;
+ }
+
+ @Override
+ public String visitTableAlias(CarbonSqlBaseParser.TableAliasContext ctx) {
+ if (null == ctx.children) {
+ return null;
+ }
+ String res = ctx.getChild(1).getText();
+ System.out.println(res);
+ return res;
+ }
+
+ @Override
+ public MergeAction visitAssignmentList(CarbonSqlBaseParser.AssignmentListContext ctx) {
+ // UPDATE SET assignmentList
+ Map<Column, Column> map = new HashMap<>();
+ for (int currIdx = 0; currIdx < ctx.getChildCount(); currIdx++) {
+ if (ctx.getChild(currIdx) instanceof CarbonSqlBaseParser.AssignmentContext) {
+ //Assume the actions are all use to pass value
+ String left = ctx.getChild(currIdx).getChild(0).getText();
+ if (left.split("\\.").length > 1) {
+ left = left.split("\\.")[1];
+ }
+ String right = ctx.getChild(currIdx).getChild(2).getText();
+ Column rightColumn = null;
+ try {
+ Expression expression = sparkParser.parseExpression(right);
+ rightColumn = new Column(expression);
+ } catch (Exception ex) {
+ // todo throw EX here

Review comment:
@QiangCai handled

##########
File path: integration/spark/src/main/java/org/apache/spark/sql/CarbonMergeIntoSQLCommand.scala
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.execution.command.AtomicRunnableCommand
+import org.apache.spark.sql.execution.command.mutation.merge._
+import org.apache.spark.sql.functions.col
+import org.apache.spark.sql.merge.model.{CarbonMergeIntoModel, TableModel}
+import org.apache.spark.util.SparkUtil._
+import org.apache.spark.util.TableAPIUtil
+
+case class CarbonMergeIntoSQLCommand(mergeInto: CarbonMergeIntoModel)
+ extends AtomicRunnableCommand {
+
+ override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+ Seq.empty
+ }
+
+ override def processData(sparkSession: SparkSession): Seq[Row] = {
+ val sourceTable: TableModel = mergeInto.getSource
+ val targetTable: TableModel = mergeInto.getTarget
+ val mergeCondition: Expression = mergeInto.getMergeCondition
+ val mergeExpression: Seq[Expression] = convertExpressionList(mergeInto.getMergeExpressions)
+ val mergeActions: Seq[MergeAction] = convertMergeActionList(mergeInto.getMergeActions)
+
+ // validate the table
+ TableAPIUtil.validateTableExists(sparkSession,
+ if (sourceTable.getDatabase == null) {
+ "default"
+ } else {
+ sourceTable.getDatabase
+ },
+ sourceTable.getTable)
+ TableAPIUtil.validateTableExists(sparkSession,
+ if (targetTable.getDatabase == null) {
+ "default"
+ } else {
+ targetTable.getDatabase
+ },

Review comment:
@QiangCai handled

##########
File path: integration/spark/src/main/java/org/apache/spark/sql/CarbonMergeIntoSQLCommand.scala
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.execution.command.AtomicRunnableCommand
+import org.apache.spark.sql.execution.command.mutation.merge._
+import org.apache.spark.sql.functions.col
+import org.apache.spark.sql.merge.model.{CarbonMergeIntoModel, TableModel}
+import org.apache.spark.util.SparkUtil._
+import org.apache.spark.util.TableAPIUtil
+
+case class CarbonMergeIntoSQLCommand(mergeInto: CarbonMergeIntoModel)
+ extends AtomicRunnableCommand {
+
+ override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+ Seq.empty
+ }
+
+ override def processData(sparkSession: SparkSession): Seq[Row] = {
+ val sourceTable: TableModel = mergeInto.getSource
+ val targetTable: TableModel = mergeInto.getTarget
+ val mergeCondition: Expression = mergeInto.getMergeCondition
+ val mergeExpression: Seq[Expression] = convertExpressionList(mergeInto.getMergeExpressions)
+ val mergeActions: Seq[MergeAction] = convertMergeActionList(mergeInto.getMergeActions)
+
+ // validate the table
+ TableAPIUtil.validateTableExists(sparkSession,
+ if (sourceTable.getDatabase == null) {
+ "default"
+ } else {
+ sourceTable.getDatabase
+ },
+ sourceTable.getTable)
+ TableAPIUtil.validateTableExists(sparkSession,
+ if (targetTable.getDatabase == null) {
+ "default"
+ } else {
+ targetTable.getDatabase
+ },
+ targetTable.getTable)
+
+ val srcDf = sparkSession.sql(s"""SELECT * FROM ${ sourceTable.getTable }""")
+ val tgDf = sparkSession.sql(s"""SELECT * FROM ${ targetTable.getTable }""")

Review comment:
@QiangCai handled

##########
File path: integration/spark/src/main/java/org/apache/spark/sql/CarbonMergeIntoSQLCommand.scala
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.execution.command.AtomicRunnableCommand
+import org.apache.spark.sql.execution.command.mutation.merge._
+import org.apache.spark.sql.functions.col
+import org.apache.spark.sql.merge.model.{CarbonMergeIntoModel, TableModel}
+import org.apache.spark.util.SparkUtil._
+import org.apache.spark.util.TableAPIUtil
+
+case class CarbonMergeIntoSQLCommand(mergeInto: CarbonMergeIntoModel)
+ extends AtomicRunnableCommand {
+
+ override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+ Seq.empty
+ }
+
+ override def processData(sparkSession: SparkSession): Seq[Row] = {
+ val sourceTable: TableModel = mergeInto.getSource
+ val targetTable: TableModel = mergeInto.getTarget
+ val mergeCondition: Expression = mergeInto.getMergeCondition
+ val mergeExpression: Seq[Expression] = convertExpressionList(mergeInto.getMergeExpressions)
+ val mergeActions: Seq[MergeAction] = convertMergeActionList(mergeInto.getMergeActions)
+
+ // validate the table
+ TableAPIUtil.validateTableExists(sparkSession,
+ if (sourceTable.getDatabase == null) {
+ "default"
+ } else {
+ sourceTable.getDatabase
+ },
+ sourceTable.getTable)
+ TableAPIUtil.validateTableExists(sparkSession,
+ if (targetTable.getDatabase == null) {
+ "default"
+ } else {
+ targetTable.getDatabase
+ },
+ targetTable.getTable)
+
+ val srcDf = sparkSession.sql(s"""SELECT * FROM ${ sourceTable.getTable }""")
+ val tgDf = sparkSession.sql(s"""SELECT * FROM ${ targetTable.getTable }""")
+
+ var matches = Seq.empty[MergeMatch]

Review comment:
@QiangCai handled

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonExtensionSqlParser.scala
##########
@@ -37,35 +37,42 @@ class CarbonExtensionSqlParser(
) extends SparkSqlParser(conf) {

val parser = new CarbonExtensionSpark2SqlParser
+ val antlrParser = new CarbonAntlrParser(this)

override def parsePlan(sqlText: String): LogicalPlan = {
parser.synchronized {
CarbonEnv.getInstance(sparkSession)
}
CarbonUtils.updateSessionInfoToCurrentThread(sparkSession)
try {
- val plan = parser.parse(sqlText)
- plan
+ parser.parse(sqlText)
} catch {
case ce: MalformedCarbonCommandException =>
throw ce
- case ex: Throwable =>
+ case _: Throwable =>
try {
- val parsedPlan = initialParser.parsePlan(sqlText)
- CarbonScalaUtil.cleanParserThreadLocals
- parsedPlan
+ antlrParser.parse(sqlText)
} catch {
- case mce: MalformedCarbonCommandException =>
- throw mce
- case e: Throwable =>
- e.printStackTrace(System.err)
- CarbonScalaUtil.cleanParserThreadLocals
- CarbonException.analysisException(
- s"""== Parser1: ${parser.getClass.getName} ==
- |${ex.getMessage}
- |== Parser2: ${initialParser.getClass.getName} ==
- |${e.getMessage}
+ case ce: MalformedCarbonCommandException =>
+ throw ce
+ case ex: Throwable =>
+ try {
+ val parsedPlan = initialParser.parsePlan(sqlText)
+ CarbonScalaUtil.cleanParserThreadLocals
+ parsedPlan
+ } catch {
+ case mce: MalformedCarbonCommandException =>
+ throw mce
+ case e: Throwable =>
+ e.printStackTrace(System.err)
+ CarbonScalaUtil.cleanParserThreadLocals
+ CarbonException.analysisException(
+ s"""== Parser1: ${ parser.getClass.getName } ==
+ |${ ex.getMessage }
+ |== Parser2: ${ initialParser.getClass.getName } ==
+ |${ e.getMessage }

Review comment:
@QiangCai handled

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/iud/MergeIntoCarbonTableTestCase.scala
##########
@@ -0,0 +1,294 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.iud
+
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterEach
+
+class MergeIntoCarbonTableTestCase extends QueryTest with BeforeAndAfterEach {

Review comment:
@QiangCai handled

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

Zhangshunyu commented on a change in pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#discussion_r551151670

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/merge/interfaces.scala
##########
@@ -71,9 +72,11 @@ case class WhenNotMatchedAndExistsOnlyOnTarget(expression: Option[Column] = None
override def getExp: Option[Column] = expression
}

-case class UpdateAction(updateMap: Map[Column, Column]) extends MergeAction
+case class UpdateAction(var updateMap: Map[Column, Column], isStar: Boolean = false)

Review comment:
@ajantha-bhat it mean insert * or update *, * means or columns, this is desc in new added doc

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-753847768

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5273/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-753852322

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3512/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-753901207

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5275/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-753902788

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3514/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Zhangshunyu commented on pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

Zhangshunyu commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-755029729

@ajantha-bhat @QiangCai fixed review comments, pls check.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] QiangCai commented on a change in pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

QiangCai commented on a change in pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#discussion_r554670324

##########
File path: integration/spark/src/main/antlr4/org/apache/spark/sql/parser/CarbonSqlBase.g4
##########
@@ -0,0 +1,642 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+

Review comment:
better to comment where we copy this g4 from.

##########
File path: docs/scd-and-cdc-guide.md
##########
@@ -55,8 +55,42 @@ Below is the detailed description of the `merge` API operation.
* `whenNotMatched` clause can have only the `insertExpr` action. The new row is generated based on the specified column and corresponding expressions. Users do not need to specify all the columns in the target table. For unspecified target columns, NULL is inserted.
* `whenNotMatchedAndExistsOnlyOnTarget` clause is executed when row does not match source and exists only in target. This clause can have only delete action.

-**NOTE:** SQL syntax for merge is not yet supported.
+#### MERGE SQL
+
+Below sql merges a set of updates, insertions, and deletions based on a source table
+into a target carbondata table.
+
+```
+ MERGE INTO target_table_identifier
+ USING source_table_identifier
+ ON <merge_condition>
+ [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
+ [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
+ [ WHEN NOT MATCHED [ AND <condition> ] THEN <not_matched_action> ]
+```
+
+#### MERGE SQL Operation Semantics
+Below is the detailed description of the `merge` SQL operation.
+* `table_identifier` a table name, optionally qualified with a database name
+* `merge_condition` how the rows from one relation are combined with the rows of another relation. An expression with a return type of Boolean.
+* `WHEN MATCHED` clauses are executed when a source row matches a target table row based on the match condition,
+clauses can have at most one UPDATE and one DELETE action, These clauses have the following semantics.
+ * The UPDATE action in merge only updates the specified columns of the matched target row.
+ * The DELETE action will delete the matched row.
+ * WHEN MATCHED clauses can have at most one UPDATE and one DELETE action. The UPDATE action in merge only updates the specified columns of the matched target row. The DELETE action will delete the matched row.
+ * Each WHEN MATCHED clause can have an optional condition. If this clause condition exists, the UPDATE or DELETE action is executed for any matching source-target row pair row only when when the clause condition is true.
+ * If there are multiple WHEN MATCHED clauses, then they are evaluated in order they are specified (that is, the order of the clauses matter). All WHEN MATCHED clauses, except the last one, must have conditions.
+ * If both WHEN MATCHED clauses have conditions and neither of the conditions are true for a matching source-target row pair, then the matched target row is left unchanged.
+ * To update all the columns of the target carbondata table with the corresponding columns of the source dataset, use UPDATE SET *. This is equivalent to UPDATE SET col1 = source.col1 [, col2 = source.col2 ...] for all the columns of the target carbondata table. Therefore, this action assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.
+* `matched_action` can be DELETE | UPDATE SET * |UPDATE SET column1 = value1 [, column2 = value2 ...]
+* `WHEN NOT MATCHED` clause is executed when a source row does not match any target row based on the match condition, , these clauses have the following semantics.

Review comment:
remove the repeated comma

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-757631774

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5293/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-757632153

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3533/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

In reply to this post by GitBox

Zhangshunyu commented on a change in pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#discussion_r554773845

##########
File path: integration/spark/src/main/antlr4/org/apache/spark/sql/parser/CarbonSqlBase.g4
##########
@@ -0,0 +1,642 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+

Review comment:
@QiangCai handled

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

1234