[GitHub] carbondata pull request #1752: [CARBONDATA-1972] Fix compaction after update...

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1752: [CARBONDATA-1972] Fix compaction after update...

qiuchenjian-2
GitHub user ravipesala opened a pull request:

    https://github.com/apache/carbondata/pull/1752

    [CARBONDATA-1972] Fix compaction after update of partition table.

    When updation happens on whole data then all old segments needs to be marked as delete. But it is not happening in case of partition table. This PR fixes it.
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [X] Any interfaces changed?
      NO
     - [X] Any backward compatibility impacted?
       NO
     - [X] Document update required?
       NO
     - [X] Testing done
           Tests added
           
     - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata compaction-fail-after-update-partition-table

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1752.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1752
   
----
commit aed803fd202f6dfb9598cddf61aa3836f9171213
Author: ravipesala <ravi.pesala@...>
Date:   2018-01-03T03:46:36Z

    Fix compaction after update of partition table.

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1752: [CARBONDATA-1972] Fix compaction after update of par...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1752
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2497/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1752: [CARBONDATA-1972] Fix compaction after update of par...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1752
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2657/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1752: [CARBONDATA-1972] Fix compaction after update of par...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1752
 
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1272/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1752: [CARBONDATA-1972] Fix compaction after update...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159367321
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ---
    @@ -98,12 +96,25 @@ public CarbonOutputCommitter(Path outputPath, TaskAttemptContext context) throws
           new CarbonIndexFileMergeWriter().mergeCarbonIndexFilesOfSegment(segmentPath);
           String updateTime =
               context.getConfiguration().get(CarbonTableOutputFormat.UPADTE_TIMESTAMP, null);
    +      String segmentsToBeDeleted =
    +          context.getConfiguration().get(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, null);
           if (updateTime != null) {
             Set<String> segmentSet = new HashSet<>(
                 new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier())
                     .getValidAndInvalidSegments().getValidSegments());
    -        CarbonUpdateUtil.updateTableMetadataStatus(segmentSet, carbonTable, updateTime, true,
    -            new ArrayList<String>());
    +        List<String> segmentDeleteList;
    +        if (segmentsToBeDeleted != null)
    +        {
    --- End diff --
   
    move to previous line


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1752: [CARBONDATA-1972] Fix compaction after update...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159376105
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ---
    @@ -98,12 +96,25 @@ public CarbonOutputCommitter(Path outputPath, TaskAttemptContext context) throws
           new CarbonIndexFileMergeWriter().mergeCarbonIndexFilesOfSegment(segmentPath);
           String updateTime =
               context.getConfiguration().get(CarbonTableOutputFormat.UPADTE_TIMESTAMP, null);
    +      String segmentsToBeDeleted =
    +          context.getConfiguration().get(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, null);
           if (updateTime != null) {
             Set<String> segmentSet = new HashSet<>(
                 new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier())
                     .getValidAndInvalidSegments().getValidSegments());
    -        CarbonUpdateUtil.updateTableMetadataStatus(segmentSet, carbonTable, updateTime, true,
    -            new ArrayList<String>());
    +        List<String> segmentDeleteList;
    --- End diff --
   
    Let the default value of `SEGMENTS_TO_BE_DELETED` be `""` in line 100 will reduce the following code (line 105~111) to 1 line: Only line 108 is enough.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159475886
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ---
    @@ -98,12 +96,25 @@ public CarbonOutputCommitter(Path outputPath, TaskAttemptContext context) throws
           new CarbonIndexFileMergeWriter().mergeCarbonIndexFilesOfSegment(segmentPath);
           String updateTime =
               context.getConfiguration().get(CarbonTableOutputFormat.UPADTE_TIMESTAMP, null);
    +      String segmentsToBeDeleted =
    +          context.getConfiguration().get(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, null);
           if (updateTime != null) {
             Set<String> segmentSet = new HashSet<>(
                 new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier())
                     .getValidAndInvalidSegments().getValidSegments());
    -        CarbonUpdateUtil.updateTableMetadataStatus(segmentSet, carbonTable, updateTime, true,
    -            new ArrayList<String>());
    +        List<String> segmentDeleteList;
    +        if (segmentsToBeDeleted != null)
    +        {
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159479290
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ---
    @@ -98,12 +96,25 @@ public CarbonOutputCommitter(Path outputPath, TaskAttemptContext context) throws
           new CarbonIndexFileMergeWriter().mergeCarbonIndexFilesOfSegment(segmentPath);
           String updateTime =
               context.getConfiguration().get(CarbonTableOutputFormat.UPADTE_TIMESTAMP, null);
    +      String segmentsToBeDeleted =
    +          context.getConfiguration().get(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, null);
           if (updateTime != null) {
             Set<String> segmentSet = new HashSet<>(
                 new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier())
                     .getValidAndInvalidSegments().getValidSegments());
    -        CarbonUpdateUtil.updateTableMetadataStatus(segmentSet, carbonTable, updateTime, true,
    -            new ArrayList<String>());
    +        List<String> segmentDeleteList;
    --- End diff --
   
    Ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1752
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2525/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1752
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1301/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1752
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2693/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159594614
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/DeleteExecution.scala ---
    @@ -58,14 +58,15 @@ object DeleteExecution {
           dataRdd: RDD[Row],
           timestamp: String,
           isUpdateOperation: Boolean,
    -      executorErrors: ExecutionErrors): Boolean = {
    +      executorErrors: ExecutionErrors): (Boolean, Seq[String]) = {
    --- End diff --
   
    It seems the first return value is always true
    And it is hard to understand this function, can you add comment for it


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159594673
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForDeleteCommand.scala ---
    @@ -77,7 +77,7 @@ private[sql] case class CarbonProjectForDeleteCommand(
             dataRdd,
             timestamp,
             isUpdateOperation = false,
    -        executorErrors)) {
    +        executorErrors)._1) {
    --- End diff --
   
    It seems always true


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159594689
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/CarbonFileFormat.scala ---
    @@ -107,11 +107,19 @@ with Serializable {
         CarbonTableOutputFormat.setOverwrite(conf, options("overwrite").toBoolean)
         // Set the update timestamp if user sets in case of update query. It needs to be updated
         // in load status update time
    -    val updateTimeStamp = options.getOrElse("updatetimestamp", null)
    -    if (updateTimeStamp != null) {
    -      conf.set(CarbonTableOutputFormat.UPADTE_TIMESTAMP, updateTimeStamp)
    -      model.setFactTimeStamp(updateTimeStamp.toLong)
    +    val updateTimeStamp = options.get("updatetimestamp")
    +    if (updateTimeStamp.isDefined) {
    +      conf.set(CarbonTableOutputFormat.UPADTE_TIMESTAMP, updateTimeStamp.get)
    +      model.setFactTimeStamp(updateTimeStamp.get.toLong)
         }
    +    // In case of update query there is chance to remove the older segments, so here we can set
    +    // the to be deleted segments to mark as delete while updating tablestatus
    +    val segemntsTobeDeleted = options.get("segmentsToBeDeleted")
    +    if (segemntsTobeDeleted.isDefined) {
    +      conf.set(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, segemntsTobeDeleted.get)
    +    }
    +
    +
    --- End diff --
   
    remove empty line


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159657874
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/DeleteExecution.scala ---
    @@ -58,14 +58,15 @@ object DeleteExecution {
           dataRdd: RDD[Row],
           timestamp: String,
           isUpdateOperation: Boolean,
    -      executorErrors: ExecutionErrors): Boolean = {
    +      executorErrors: ExecutionErrors): (Boolean, Seq[String]) = {
    --- End diff --
   
    ok updated


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159657961
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForDeleteCommand.scala ---
    @@ -77,7 +77,7 @@ private[sql] case class CarbonProjectForDeleteCommand(
             dataRdd,
             timestamp,
             isUpdateOperation = false,
    -        executorErrors)) {
    +        executorErrors)._1) {
    --- End diff --
   
    removed returning status


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1752: [CARBONDATA-1972][PARTITION] Fix compaction a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1752#discussion_r159658076
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/CarbonFileFormat.scala ---
    @@ -107,11 +107,19 @@ with Serializable {
         CarbonTableOutputFormat.setOverwrite(conf, options("overwrite").toBoolean)
         // Set the update timestamp if user sets in case of update query. It needs to be updated
         // in load status update time
    -    val updateTimeStamp = options.getOrElse("updatetimestamp", null)
    -    if (updateTimeStamp != null) {
    -      conf.set(CarbonTableOutputFormat.UPADTE_TIMESTAMP, updateTimeStamp)
    -      model.setFactTimeStamp(updateTimeStamp.toLong)
    +    val updateTimeStamp = options.get("updatetimestamp")
    +    if (updateTimeStamp.isDefined) {
    +      conf.set(CarbonTableOutputFormat.UPADTE_TIMESTAMP, updateTimeStamp.get)
    +      model.setFactTimeStamp(updateTimeStamp.get.toLong)
         }
    +    // In case of update query there is chance to remove the older segments, so here we can set
    +    // the to be deleted segments to mark as delete while updating tablestatus
    +    val segemntsTobeDeleted = options.get("segmentsToBeDeleted")
    +    if (segemntsTobeDeleted.isDefined) {
    +      conf.set(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, segemntsTobeDeleted.get)
    +    }
    +
    +
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1752
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1321/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1752
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2559/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1752: [CARBONDATA-1972][PARTITION] Fix compaction after up...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1752
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2727/



---
12