[DISCUSSION] Hive and Presto Write support + Performance improvement

Posted by kunalkapoor on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Hive-and-Presto-Write-support-Performance-improvement-tp90002.html

Hi All,
As you all know that carbon has been supporting reading carbontable from
presto and hive for a long time now and its high time that we start
supporting write from presto and hive in 2.0.0 version.

The development would be divided into 2 Phases.

*Phase1 (Hive):*
*1. Support a OutputFormat(MapredCarbonOutputFormat) that allows the user
to write data in carbondata format from hive.*
    - Tables would be created in spark, until a solution to create schema
file in hive is found.
    - Tables would support the same folder structure as a transactional
table.
    - Any carbon specific DDL/DML would not be supported.

*2. Read Performance should be better or equivalent to ORC.*

*Phase2 (Presto): To be done later*
The Tasks are same as Hive and any update to the task list would be updated
after analysis.

Any suggestions from the community is appreciated.

Thanks
Kunal Kapoor