GitHub user kunal642 opened a pull request:
https://github.com/apache/carbondata/pull/2977 [WIP] [CARBONDATA-3147] Fixed concurrent load issue Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata bug/CARBONDATA-3147 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2977.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2977 ---- commit d48374164a2b8ff129973e0e15f48c867ccddc1a Author: kunal642 <kunalkapoor642@...> Date: 2018-12-05T10:30:41Z Fixed concurrent load issue ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1644/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1855/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9904/ --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2977#discussion_r239299140 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateListeners.scala --- @@ -156,20 +177,20 @@ object AlterTableDropPartitionPostStatusListener extends OperationEventListener // Generate table status file name without UUID, forExample: tablestatus val newTableSchemaPath = CarbonTablePath.getTableStatusFilePath( childCarbonTable.getTablePath) - renameDataMapTableStatusFiles(oldTableSchemaPath, newTableSchemaPath, uuid) + mergeTableStatusContents(oldTableSchemaPath, newTableSchemaPath, uuid) } // if true then the commit for one of the child tables has failed val commitFailed = renamedDataMaps.lengthCompare(childCommands.length) != 0 if (commitFailed) { LOGGER.info("Reverting table status file to original state") - renamedDataMaps.foreach { - command => - val carbonTable = command.table - // rename the backup tablestatus i.e tablestatus_backup_UUID to tablestatus - val backupTableSchemaPath = - CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) + "_backup_" + uuid - val tableSchemaPath = CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) - renameDataMapTableStatusFiles(backupTableSchemaPath, tableSchemaPath, "") + LOGGER.warn("Reverting table status file to original state") --- End diff -- line 185 and 186 is repeated --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1650/ --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2977#discussion_r239331535 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateListeners.scala --- @@ -156,20 +177,20 @@ object AlterTableDropPartitionPostStatusListener extends OperationEventListener // Generate table status file name without UUID, forExample: tablestatus val newTableSchemaPath = CarbonTablePath.getTableStatusFilePath( childCarbonTable.getTablePath) - renameDataMapTableStatusFiles(oldTableSchemaPath, newTableSchemaPath, uuid) + mergeTableStatusContents(oldTableSchemaPath, newTableSchemaPath, uuid) } // if true then the commit for one of the child tables has failed val commitFailed = renamedDataMaps.lengthCompare(childCommands.length) != 0 if (commitFailed) { LOGGER.info("Reverting table status file to original state") - renamedDataMaps.foreach { - command => - val carbonTable = command.table - // rename the backup tablestatus i.e tablestatus_backup_UUID to tablestatus - val backupTableSchemaPath = - CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) + "_backup_" + uuid - val tableSchemaPath = CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) - renameDataMapTableStatusFiles(backupTableSchemaPath, tableSchemaPath, "") + LOGGER.warn("Reverting table status file to original state") --- End diff -- removed --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1651/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9911/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2977#discussion_r239344495 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/AggregateDataMapCompactor.scala --- @@ -79,9 +79,20 @@ class AggregateDataMapCompactor(carbonLoadModel: CarbonLoadModel, CarbonSession.threadSet(CarbonCommonConstants.SUPPORT_DIRECT_QUERY_ON_DATAMAP, "true") loadCommand.processData(sqlContext.sparkSession) - val newLoadMetaDataDetails = SegmentStatusManager.readLoadMetadata( + val oldMetadataDetails = SegmentStatusManager.readLoadMetadata( + carbonTable.getMetadataPath, "") + val newMetadataDetails = SegmentStatusManager.readLoadMetadata( carbonTable.getMetadataPath, uuid) - val updatedLoadMetaDataDetails = newLoadMetaDataDetails collect { + val mergedContent = oldMetadataDetails.collect { + case content => + val contentIndex = newMetadataDetails.indexOf(content) --- End diff -- How can you make sure the segment data from `newMetadataDetails` always latest in case of concurrent scenario. I feel you should take only the segment need to be updated from `newMetadataDetails` --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1862/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2977#discussion_r239345454 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateListeners.scala --- @@ -111,6 +113,29 @@ trait CommitHelper { } } + protected def mergeTableStatusContents(uuidTableStatusPath: String, --- End diff -- It seems duplicate, please try to use same method from other places as well. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1654/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1655/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9915/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1867/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1665/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1877/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9925/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2977 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1666/ --- |
Free forum by Nabble | Edit this page |