ajantha-bhat commented on a change in pull request #3433: [CARBONDATA-3570] Change task number to jobid+taskid for FileFormat
URL:
https://github.com/apache/carbondata/pull/3433#discussion_r342429257
##########
File path: integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
##########
@@ -154,8 +154,11 @@ class SparkCarbonFileFormat extends FileFormat
path
}
context.getConfiguration.set("carbon.outputformat.writepath", updatedPath)
+ // "jobid"+"x"+"taskid", task retry should have same task number
context.getConfiguration.set("carbon.outputformat.taskno",
- UUID.randomUUID().toString.replace("-", ""))
+ context.getTaskAttemptID.getJobID.getJtIdentifier +
+ context.getTaskAttemptID.getJobID.getId
+ + 'x' + context.getTaskAttemptID.getTaskID.getId)
Review comment:
@jackylk
a. 'x' is used to separate jobid and task id.
b. we cannot use logic from carbondata table as taskid, in this case it will be just split index. [concurrent scenario , no issue because it has segment id] And this id will be serialized and send to executor. So, during task retry it will use same id. However in case of file format no segment id present, it was using UUID for each task.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[hidden email]
With regards,
Apache Git Services