Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3324: [HOTFIX] Fix task id in FileFormat write

Classic

List

Threaded

1 message

GitBox

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3324: [HOTFIX] Fix task id in FileFormat write

ajantha-bhat opened a new pull request #3324: [HOTFIX] Fix task id in FileFormat write
URL: https://github.com/apache/carbondata/pull/3324

problem : in FIleFormat write carbon is using task id as System.nanoTime()
cause : when multiple tasks launched concurrently, there is a chance that two task can have same id very rarely, due to this two spark task launched for one insert will have same carbondata file name.
so, when both tasks write to one file, chances are more to corrupt the file. which leads in query failure
solution: use own unique task id instead of nano seconds.
here use spark task id + global counter to generate unique task id across jobs.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed? NA

- [ ] Any backward compatibility impacted? NA

- [ ] Document update required? NA

- [ ] Testing done
done. Attached the report
[testReport.txt](https://github.com/apache/carbondata/files/3388501/testReport.txt)

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. [NA]

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services