Apache CarbonData Dev Mailing List archive

[DISCUSSION] Hive and Presto Write support + Performance improvement

Classic

List

Threaded

1 message

kunalkapoor

Jan 09, 2020; 6:19am

[DISCUSSION] Hive and Presto Write support + Performance improvement

Hi All,
As you all know that carbon has been supporting reading carbontable from
presto and hive for a long time now and its high time that we start
supporting write from presto and hive in 2.0.0 version.

The development would be divided into 2 Phases.

*Phase1 (Hive):*
*1. Support a OutputFormat(MapredCarbonOutputFormat) that allows the user
to write data in carbondata format from hive.*
- Tables would be created in spark, until a solution to create schema
file in hive is found.
- Tables would support the same folder structure as a transactional
table.
- Any carbon specific DDL/DML would not be supported.

*2. Read Performance should be better or equivalent to ORC.*

*Phase2 (Presto): To be done later*
The Tasks are same as Hive and any update to the task list would be updated
after analysis.

Any suggestions from the community is appreciated.

Thanks
Kunal Kapoor