[DISCUSSION] Hive and Presto Write support + Performance improvement

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSSION] Hive and Presto Write support + Performance improvement

kunalkapoor
Hi All,
As you all know that carbon has been supporting reading carbontable from
presto and hive for a long time now and its high time that we start
supporting write from presto and hive in 2.0.0 version.

The development would be divided into 2 Phases.

*Phase1 (Hive):*
*1. Support a OutputFormat(MapredCarbonOutputFormat) that allows the user
to write data in carbondata format from hive.*
    - Tables would be created in spark, until a solution to create schema
file in hive is found.
    - Tables would support the same folder structure as a transactional
table.
    - Any carbon specific DDL/DML would not be supported.

*2. Read Performance should be better or equivalent to ORC.*

*Phase2 (Presto): To be done later*
The Tasks are same as Hive and any update to the task list would be updated
after analysis.

Any suggestions from the community is appreciated.

Thanks
Kunal Kapoor