Apache CarbonData Dev Mailing List archive

[DISCUSSION] Improve Simple insert performance in carbondata

Posted by akshay_nuthala on Feb 02, 2021; 1:20pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Improve-Simple-insert-performance-in-carbondata-tp105934.html

Hi Community,

As Carbon is closely integrated with spark, insert operations in carbon are
done using spark API. This in turn fires spark jobs, which adds various
overhead like task serialisation cost, extra memory consumption, execution
time in remote nodes, shuffle etc.

In case of simple insert operations - we can improve the performance by
reusing SDK (which is plain java code) to achieve the same, thereby cutting
off the overheads discussed above.

Following is the link to the design document. Please give your valuable
comments/inputs/suggestions.

https://docs.google.com/document/d/1BcbTcO__vZbLLuhU73NIcbJOM2FRcKBa-ZxackofAS0/edit?usp=sharing

Thanks,

Regards,
N Akshay Kumar

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/