[GitHub] [carbondata] ajantha-bhat opened a new pull request #3276: [CARBONDATA-3426] Fix load performance by fixing task distribution issue

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3276: [CARBONDATA-3426] Fix load performance by fixing task distribution issue

GitBox
ajantha-bhat opened a new pull request #3276: [CARBONDATA-3426] Fix load performance by fixing task distribution issue
URL: https://github.com/apache/carbondata/pull/3276
 
 
   **Problem:** Load performance degrade due to task distribution issue.
   
   **Cause:** Consider 3 node cluster (host name a,b,c with IP1, IP2, IP3 as ip address), to launch load task, host name is required from NewCarbonDataLoadRDD in getPreferredLocations(). But if the driver is 'a' (IP1),
   
   Result is IP1, b,c instead of a,b,c. Hence task was not launching to one executor which is same ip as driver.
   
   getLocalhostIPs is modified in current version recently and instead of IP it was returning hostname, hence local ip hostname was removed instead of IP address.
   
   solution: Revert the change in getLocalhostIPs as it is not used in any other flow.
   
   Be sure to do all of the following checklist to help us incorporate
   your contribution quickly and easily:
   
    - [ ] Any interfaces changed? NA
   
    - [ ] Any backward compatibility impacted? NA
   
    - [ ] Document update required? NA
   
    - [ ] Testing done
      yes, tested with 17 node spark clusters with huge data.
      Load performance is same as previous version. [degrade was 30%]
   
   - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.  NA
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services