[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3433: [CARBONDATA-3570] Change task number to jobid+taskid for FileFormat

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3433: [CARBONDATA-3570] Change task number to jobid+taskid for FileFormat

GitBox
ajantha-bhat commented on a change in pull request #3433: [CARBONDATA-3570] Change task number to jobid+taskid for FileFormat
URL: https://github.com/apache/carbondata/pull/3433#discussion_r342429257
 
 

 ##########
 File path: integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
 ##########
 @@ -154,8 +154,11 @@ class SparkCarbonFileFormat extends FileFormat
           path
         }
         context.getConfiguration.set("carbon.outputformat.writepath", updatedPath)
+        // "jobid"+"x"+"taskid", task retry should have same task number
         context.getConfiguration.set("carbon.outputformat.taskno",
-          UUID.randomUUID().toString.replace("-", ""))
+          context.getTaskAttemptID.getJobID.getJtIdentifier +
+          context.getTaskAttemptID.getJobID.getId
+          + 'x' + context.getTaskAttemptID.getTaskID.getId)
 
 Review comment:
   @jackylk
   a. 'x' is used to separate jobid and task id.
   b. we cannot use logic from carbondata table as taskid, in this case it will be just split index. [concurrent scenario , no issue because it has segment id] And this id will be serialized and send to executor. So, during task retry it will use same id. However in case of file format, it was using UUID for each task.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services