[GitHub] carbondata pull request #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load...

classic Classic list List threaded Threaded
41 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load...

qiuchenjian-2
GitHub user kunal642 opened a pull request:

    https://github.com/apache/carbondata/pull/2977

    [WIP] [CARBONDATA-3147] Fixed concurrent load issue

    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kunal642/carbondata bug/CARBONDATA-3147

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2977.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2977
   
----
commit d48374164a2b8ff129973e0e15f48c867ccddc1a
Author: kunal642 <kunalkapoor642@...>
Date:   2018-12-05T10:30:41Z

    Fixed concurrent load issue

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1644/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1855/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9904/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2977#discussion_r239299140
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateListeners.scala ---
    @@ -156,20 +177,20 @@ object AlterTableDropPartitionPostStatusListener extends OperationEventListener
                 // Generate table status file name without UUID, forExample: tablestatus
                 val newTableSchemaPath = CarbonTablePath.getTableStatusFilePath(
                   childCarbonTable.getTablePath)
    -            renameDataMapTableStatusFiles(oldTableSchemaPath, newTableSchemaPath, uuid)
    +            mergeTableStatusContents(oldTableSchemaPath, newTableSchemaPath, uuid)
             }
             // if true then the commit for one of the child tables has failed
             val commitFailed = renamedDataMaps.lengthCompare(childCommands.length) != 0
             if (commitFailed) {
               LOGGER.info("Reverting table status file to original state")
    -          renamedDataMaps.foreach {
    -            command =>
    -              val carbonTable = command.table
    -              // rename the backup tablestatus i.e tablestatus_backup_UUID to tablestatus
    -              val backupTableSchemaPath =
    -                CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) + "_backup_" + uuid
    -              val tableSchemaPath = CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath)
    -              renameDataMapTableStatusFiles(backupTableSchemaPath, tableSchemaPath, "")
    +          LOGGER.warn("Reverting table status file to original state")
    --- End diff --
   
    line 185 and 186 is repeated


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1650/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2977#discussion_r239331535
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateListeners.scala ---
    @@ -156,20 +177,20 @@ object AlterTableDropPartitionPostStatusListener extends OperationEventListener
                 // Generate table status file name without UUID, forExample: tablestatus
                 val newTableSchemaPath = CarbonTablePath.getTableStatusFilePath(
                   childCarbonTable.getTablePath)
    -            renameDataMapTableStatusFiles(oldTableSchemaPath, newTableSchemaPath, uuid)
    +            mergeTableStatusContents(oldTableSchemaPath, newTableSchemaPath, uuid)
             }
             // if true then the commit for one of the child tables has failed
             val commitFailed = renamedDataMaps.lengthCompare(childCommands.length) != 0
             if (commitFailed) {
               LOGGER.info("Reverting table status file to original state")
    -          renamedDataMaps.foreach {
    -            command =>
    -              val carbonTable = command.table
    -              // rename the backup tablestatus i.e tablestatus_backup_UUID to tablestatus
    -              val backupTableSchemaPath =
    -                CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) + "_backup_" + uuid
    -              val tableSchemaPath = CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath)
    -              renameDataMapTableStatusFiles(backupTableSchemaPath, tableSchemaPath, "")
    +          LOGGER.warn("Reverting table status file to original state")
    --- End diff --
   
    removed


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1651/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9911/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2977#discussion_r239344495
 
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/AggregateDataMapCompactor.scala ---
    @@ -79,9 +79,20 @@ class AggregateDataMapCompactor(carbonLoadModel: CarbonLoadModel,
             CarbonSession.threadSet(CarbonCommonConstants.SUPPORT_DIRECT_QUERY_ON_DATAMAP,
               "true")
             loadCommand.processData(sqlContext.sparkSession)
    -        val newLoadMetaDataDetails = SegmentStatusManager.readLoadMetadata(
    +        val oldMetadataDetails = SegmentStatusManager.readLoadMetadata(
    +          carbonTable.getMetadataPath, "")
    +        val newMetadataDetails = SegmentStatusManager.readLoadMetadata(
               carbonTable.getMetadataPath, uuid)
    -        val updatedLoadMetaDataDetails = newLoadMetaDataDetails collect {
    +        val mergedContent = oldMetadataDetails.collect {
    +          case content =>
    +            val contentIndex = newMetadataDetails.indexOf(content)
    --- End diff --
   
    How can you make sure the segment data from `newMetadataDetails` always latest in case of concurrent scenario.
    I feel you should take only the segment need to be updated from `newMetadataDetails`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1862/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2977#discussion_r239345454
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateListeners.scala ---
    @@ -111,6 +113,29 @@ trait CommitHelper {
         }
       }
     
    +  protected def mergeTableStatusContents(uuidTableStatusPath: String,
    --- End diff --
   
    It seems duplicate, please try to use same method from other places as well.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1654/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1655/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Failed  with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9915/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1867/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1665/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1877/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Failed  with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9925/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2977
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1666/



---
123