[GitHub] [carbondata] Indhumathi27 opened a new pull request #3846: [WIP] Fix CDC delete data on partition table

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 opened a new pull request #3846: [WIP] Fix CDC delete data on partition table

GitBox

Indhumathi27 opened a new pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846


    ### Why is this PR needed?
   
   
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3846: [WIP] Fix CDC delete data on partition table

GitBox

CarbonDataQA1 commented on pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#issuecomment-658779577


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1656/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3846: [WIP] Fix CDC delete data on partition table

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#issuecomment-658780719


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3398/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#issuecomment-658879364


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3400/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#issuecomment-658880484


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1658/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#discussion_r457857926



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/merge/CarbonMergeDataSetCommand.scala
##########
@@ -175,6 +178,21 @@ case class CarbonMergeDataSetCommand(
         LOGGER.error("writing of update status file failed")
         throw new CarbonMergeDataSetException("writing of update status file failed")
       }
+      if (carbonTable.isHivePartitionTable) {
+        // If load count is 0 and if merge action contains delete operation, update
+        // tableUpdateStatus file name in loadMeta entry
+        if (count == 0 && hasDelAction && !tuple._1.isEmpty) {
+          val loadMetaDataDetails = SegmentStatusManager.readTableStatusFile(CarbonTablePath
+            .getTableStatusFilePath(carbonTable.getTablePath))
+          CarbonUpdateUtil.updateTableMetadataStatus(loadMetaDataDetails.map(l =>
+            new Segment(l.getMergedLoadName,

Review comment:
       rename variable `l` to `loadMetadataDetails`

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala
##########
@@ -725,6 +725,45 @@ class MergeTestCase extends QueryTest with BeforeAndAfterAll {
     checkAnswer(sql("select * from target order by key"), Seq(Row("c", "200"), Row("d", "3"), Row("e", "100")))
   }
 
+  test("check the ccd delete with partition") {
+    sql("drop table if exists target")
+
+    val initframe = sqlContext.sparkSession.createDataFrame(Seq(
+      Row("a", "0"),
+      Row("b", "1"),
+      Row("c", "2"),
+      Row("d", "3")
+    ).asJava, StructType(Seq(StructField("key", StringType), StructField("value", StringType))))
+
+    initframe.write
+      .format("carbondata")
+      .option("tableName", "target")
+      .option("partitionColumns", "value")
+      .mode(SaveMode.Overwrite)
+      .save()
+    val target = sqlContext.read.format("carbondata").option("tableName", "target").load()
+    var ccd =
+      sqlContext.sparkSession.createDataFrame(Seq(
+        Row("a", null, true, 1),
+        Row("b", null, true, 2),
+        Row("c", null, true, 3),
+        Row("e", "100", false, 6)
+      ).asJava,
+        StructType(Seq(StructField("key", StringType),
+          StructField("newValue", StringType),
+          StructField("deleted", BooleanType), StructField("time", IntegerType))))
+    ccd.createOrReplaceTempView("changes")
+
+    ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes GROUP BY key)")
+
+    target.as("A").merge(ccd.as("B"), "A.key=B.key").
+      whenMatched("B.deleted=true").delete().execute()
+
+    assert(getDeleteDeltaFileCount("target", "0") == 0)

Review comment:
       if delete operation is successful, then it should contain delete delta file, please check and add proper assert for delete delta file count




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#issuecomment-661748809


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3449/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#issuecomment-661749375


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1707/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#discussion_r458017848



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/merge/CarbonMergeDataSetCommand.scala
##########
@@ -175,6 +178,21 @@ case class CarbonMergeDataSetCommand(
         LOGGER.error("writing of update status file failed")
         throw new CarbonMergeDataSetException("writing of update status file failed")
       }
+      if (carbonTable.isHivePartitionTable) {
+        // If load count is 0 and if merge action contains delete operation, update
+        // tableUpdateStatus file name in loadMeta entry
+        if (count == 0 && hasDelAction && !tuple._1.isEmpty) {
+          val loadMetaDataDetails = SegmentStatusManager.readTableStatusFile(CarbonTablePath
+            .getTableStatusFilePath(carbonTable.getTablePath))
+          CarbonUpdateUtil.updateTableMetadataStatus(loadMetaDataDetails.map(loadMetadataDetails =>

Review comment:
       ```suggestion
             CarbonUpdateUtil.updateTableMetadataStatus(loadMetaDataDetails.map(loadMetadataDetail =>
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#discussion_r458018601



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala
##########
@@ -725,6 +725,47 @@ class MergeTestCase extends QueryTest with BeforeAndAfterAll {
     checkAnswer(sql("select * from target order by key"), Seq(Row("c", "200"), Row("d", "3"), Row("e", "100")))
   }
 
+  test("check the ccd delete with partition") {

Review comment:
       ```suggestion
     test("check the cdc delete with partition") {
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#discussion_r458018787



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala
##########
@@ -725,6 +725,47 @@ class MergeTestCase extends QueryTest with BeforeAndAfterAll {
     checkAnswer(sql("select * from target order by key"), Seq(Row("c", "200"), Row("d", "3"), Row("e", "100")))
   }
 
+  test("check the ccd delete with partition") {
+    sql("drop table if exists target")
+
+    val initframe = sqlContext.sparkSession.createDataFrame(Seq(
+      Row("a", "0"),
+      Row("a1", "0"),
+      Row("b", "1"),
+      Row("c", "2"),
+      Row("d", "3")
+    ).asJava, StructType(Seq(StructField("key", StringType), StructField("value", StringType))))
+
+    initframe.write
+      .format("carbondata")
+      .option("tableName", "target")
+      .option("partitionColumns", "value")
+      .mode(SaveMode.Overwrite)
+      .save()
+    val target = sqlContext.read.format("carbondata").option("tableName", "target").load()
+    var ccd =

Review comment:
       ```suggestion
       var cdc =
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#discussion_r458018898



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala
##########
@@ -725,6 +725,47 @@ class MergeTestCase extends QueryTest with BeforeAndAfterAll {
     checkAnswer(sql("select * from target order by key"), Seq(Row("c", "200"), Row("d", "3"), Row("e", "100")))
   }
 
+  test("check the ccd delete with partition") {
+    sql("drop table if exists target")
+
+    val initframe = sqlContext.sparkSession.createDataFrame(Seq(
+      Row("a", "0"),
+      Row("a1", "0"),
+      Row("b", "1"),
+      Row("c", "2"),
+      Row("d", "3")
+    ).asJava, StructType(Seq(StructField("key", StringType), StructField("value", StringType))))
+
+    initframe.write
+      .format("carbondata")
+      .option("tableName", "target")
+      .option("partitionColumns", "value")
+      .mode(SaveMode.Overwrite)
+      .save()
+    val target = sqlContext.read.format("carbondata").option("tableName", "target").load()
+    var ccd =
+      sqlContext.sparkSession.createDataFrame(Seq(
+        Row("a", null, true, 1),
+        Row("a1", null, false, 1),
+        Row("b", null, true, 2),
+        Row("c", null, true, 3),
+        Row("e", "100", false, 6)
+      ).asJava,
+        StructType(Seq(StructField("key", StringType),
+          StructField("newValue", StringType),
+          StructField("deleted", BooleanType), StructField("time", IntegerType))))
+    ccd.createOrReplaceTempView("changes")
+
+    ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes GROUP BY key)")

Review comment:
       same as above




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#discussion_r458019493



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala
##########
@@ -869,10 +910,15 @@ class MergeTestCase extends QueryTest with BeforeAndAfterAll {
     CarbonProperties.getInstance().addProperty("carbon.enable.auto.load.merge", "false")
   }
 
-  private def getDeleteDeltaFileCount(tableName: String, segment: String): Int = {
+  private def getDeleteDeltaFileCount(tableName: String,
+      segment: String,

Review comment:
       no need to change the method signature, you are already having the carbonTable object, check from it whether its partition table or not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#issuecomment-662011240


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3458/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#issuecomment-662012100


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1716/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

GitBox
In reply to this post by GitBox

akashrn5 commented on pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#issuecomment-662235965


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

GitBox
In reply to this post by GitBox

asfgit closed pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]