GitHub user ravipesala opened a pull request:
https://github.com/apache/carbondata/pull/1752 [CARBONDATA-1972] Fix compaction after update of partition table. When updation happens on whole data then all old segments needs to be marked as delete. But it is not happening in case of partition table. This PR fixes it. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [X] Any interfaces changed? NO - [X] Any backward compatibility impacted? NO - [X] Document update required? NO - [X] Testing done Tests added - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata compaction-fail-after-update-partition-table Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1752.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1752 ---- commit aed803fd202f6dfb9598cddf61aa3836f9171213 Author: ravipesala <ravi.pesala@...> Date: 2018-01-03T03:46:36Z Fix compaction after update of partition table. ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1752 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2497/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1752 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2657/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1752 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1272/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1752#discussion_r159367321 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java --- @@ -98,12 +96,25 @@ public CarbonOutputCommitter(Path outputPath, TaskAttemptContext context) throws new CarbonIndexFileMergeWriter().mergeCarbonIndexFilesOfSegment(segmentPath); String updateTime = context.getConfiguration().get(CarbonTableOutputFormat.UPADTE_TIMESTAMP, null); + String segmentsToBeDeleted = + context.getConfiguration().get(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, null); if (updateTime != null) { Set<String> segmentSet = new HashSet<>( new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier()) .getValidAndInvalidSegments().getValidSegments()); - CarbonUpdateUtil.updateTableMetadataStatus(segmentSet, carbonTable, updateTime, true, - new ArrayList<String>()); + List<String> segmentDeleteList; + if (segmentsToBeDeleted != null) + { --- End diff -- move to previous line --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1752#discussion_r159376105 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java --- @@ -98,12 +96,25 @@ public CarbonOutputCommitter(Path outputPath, TaskAttemptContext context) throws new CarbonIndexFileMergeWriter().mergeCarbonIndexFilesOfSegment(segmentPath); String updateTime = context.getConfiguration().get(CarbonTableOutputFormat.UPADTE_TIMESTAMP, null); + String segmentsToBeDeleted = + context.getConfiguration().get(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, null); if (updateTime != null) { Set<String> segmentSet = new HashSet<>( new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier()) .getValidAndInvalidSegments().getValidSegments()); - CarbonUpdateUtil.updateTableMetadataStatus(segmentSet, carbonTable, updateTime, true, - new ArrayList<String>()); + List<String> segmentDeleteList; --- End diff -- Let the default value of `SEGMENTS_TO_BE_DELETED` be `""` in line 100 will reduce the following code (line 105~111) to 1 line: Only line 108 is enough. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1752#discussion_r159475886 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java --- @@ -98,12 +96,25 @@ public CarbonOutputCommitter(Path outputPath, TaskAttemptContext context) throws new CarbonIndexFileMergeWriter().mergeCarbonIndexFilesOfSegment(segmentPath); String updateTime = context.getConfiguration().get(CarbonTableOutputFormat.UPADTE_TIMESTAMP, null); + String segmentsToBeDeleted = + context.getConfiguration().get(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, null); if (updateTime != null) { Set<String> segmentSet = new HashSet<>( new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier()) .getValidAndInvalidSegments().getValidSegments()); - CarbonUpdateUtil.updateTableMetadataStatus(segmentSet, carbonTable, updateTime, true, - new ArrayList<String>()); + List<String> segmentDeleteList; + if (segmentsToBeDeleted != null) + { --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1752#discussion_r159479290 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java --- @@ -98,12 +96,25 @@ public CarbonOutputCommitter(Path outputPath, TaskAttemptContext context) throws new CarbonIndexFileMergeWriter().mergeCarbonIndexFilesOfSegment(segmentPath); String updateTime = context.getConfiguration().get(CarbonTableOutputFormat.UPADTE_TIMESTAMP, null); + String segmentsToBeDeleted = + context.getConfiguration().get(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, null); if (updateTime != null) { Set<String> segmentSet = new HashSet<>( new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier()) .getValidAndInvalidSegments().getValidSegments()); - CarbonUpdateUtil.updateTableMetadataStatus(segmentSet, carbonTable, updateTime, true, - new ArrayList<String>()); + List<String> segmentDeleteList; --- End diff -- Ok --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1752 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2525/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1752 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1301/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1752 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2693/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1752#discussion_r159594614 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/DeleteExecution.scala --- @@ -58,14 +58,15 @@ object DeleteExecution { dataRdd: RDD[Row], timestamp: String, isUpdateOperation: Boolean, - executorErrors: ExecutionErrors): Boolean = { + executorErrors: ExecutionErrors): (Boolean, Seq[String]) = { --- End diff -- It seems the first return value is always true And it is hard to understand this function, can you add comment for it --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1752#discussion_r159594673 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForDeleteCommand.scala --- @@ -77,7 +77,7 @@ private[sql] case class CarbonProjectForDeleteCommand( dataRdd, timestamp, isUpdateOperation = false, - executorErrors)) { + executorErrors)._1) { --- End diff -- It seems always true --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1752#discussion_r159594689 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/CarbonFileFormat.scala --- @@ -107,11 +107,19 @@ with Serializable { CarbonTableOutputFormat.setOverwrite(conf, options("overwrite").toBoolean) // Set the update timestamp if user sets in case of update query. It needs to be updated // in load status update time - val updateTimeStamp = options.getOrElse("updatetimestamp", null) - if (updateTimeStamp != null) { - conf.set(CarbonTableOutputFormat.UPADTE_TIMESTAMP, updateTimeStamp) - model.setFactTimeStamp(updateTimeStamp.toLong) + val updateTimeStamp = options.get("updatetimestamp") + if (updateTimeStamp.isDefined) { + conf.set(CarbonTableOutputFormat.UPADTE_TIMESTAMP, updateTimeStamp.get) + model.setFactTimeStamp(updateTimeStamp.get.toLong) } + // In case of update query there is chance to remove the older segments, so here we can set + // the to be deleted segments to mark as delete while updating tablestatus + val segemntsTobeDeleted = options.get("segmentsToBeDeleted") + if (segemntsTobeDeleted.isDefined) { + conf.set(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, segemntsTobeDeleted.get) + } + + --- End diff -- remove empty line --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1752#discussion_r159657874 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/DeleteExecution.scala --- @@ -58,14 +58,15 @@ object DeleteExecution { dataRdd: RDD[Row], timestamp: String, isUpdateOperation: Boolean, - executorErrors: ExecutionErrors): Boolean = { + executorErrors: ExecutionErrors): (Boolean, Seq[String]) = { --- End diff -- ok updated --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1752#discussion_r159657961 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForDeleteCommand.scala --- @@ -77,7 +77,7 @@ private[sql] case class CarbonProjectForDeleteCommand( dataRdd, timestamp, isUpdateOperation = false, - executorErrors)) { + executorErrors)._1) { --- End diff -- removed returning status --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1752#discussion_r159658076 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/CarbonFileFormat.scala --- @@ -107,11 +107,19 @@ with Serializable { CarbonTableOutputFormat.setOverwrite(conf, options("overwrite").toBoolean) // Set the update timestamp if user sets in case of update query. It needs to be updated // in load status update time - val updateTimeStamp = options.getOrElse("updatetimestamp", null) - if (updateTimeStamp != null) { - conf.set(CarbonTableOutputFormat.UPADTE_TIMESTAMP, updateTimeStamp) - model.setFactTimeStamp(updateTimeStamp.toLong) + val updateTimeStamp = options.get("updatetimestamp") + if (updateTimeStamp.isDefined) { + conf.set(CarbonTableOutputFormat.UPADTE_TIMESTAMP, updateTimeStamp.get) + model.setFactTimeStamp(updateTimeStamp.get.toLong) } + // In case of update query there is chance to remove the older segments, so here we can set + // the to be deleted segments to mark as delete while updating tablestatus + val segemntsTobeDeleted = options.get("segmentsToBeDeleted") + if (segemntsTobeDeleted.isDefined) { + conf.set(CarbonTableOutputFormat.SEGMENTS_TO_BE_DELETED, segemntsTobeDeleted.get) + } + + --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1752 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1321/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1752 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2559/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1752 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2727/ --- |
Free forum by Nabble | Edit this page |