[GitHub] [carbondata] ajantha-bhat opened a new pull request #3324: [HOTFIX] Fix task id in FileFormat write

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3324: [HOTFIX] Fix task id in FileFormat write

GitBox
ajantha-bhat opened a new pull request #3324: [HOTFIX] Fix task id in FileFormat write
URL: https://github.com/apache/carbondata/pull/3324
 
 
   problem : in FIleFormat write carbon is using task id as System.nanoTime()
   cause :  when multiple tasks launched concurrently, there is a chance that two task can have same id  very rarely, due to this two spark task launched for one insert will have same carbondata file name.
   so, when both tasks write to one file, chances are more to corrupt the file. which leads in query failure
   solution: use own unique task id instead of nano seconds.
   here use spark task id  + global counter to generate unique task id across jobs.
   
   Be sure to do all of the following checklist to help us incorporate
   your contribution quickly and easily:
   
    - [ ] Any interfaces changed? NA
   
    - [ ] Any backward compatibility impacted? NA
   
    - [ ] Document update required? NA
   
    - [ ] Testing done
   done. Attached the report
   [testReport.txt](https://github.com/apache/carbondata/files/3388501/testReport.txt)
   
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.  [NA]
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services