Apache CarbonData Dev Mailing List archive

Re: Carbondata hive integration Plan

Posted by Liang Chen on Jun 02, 2017; 12:11am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Carbondata-integration-Plan-tp13450p13647.html

Hi cenyuhai

Thanks for you started this discussion about hive integration:

1、Make carbon schema compatible with hive（CARBONDATA-1008）(create table and alter table)

Liang: Like you mentioned, for first phase(1.2.0), supports read carbondata files in hive. so can i understand the flow should be liking this : a)all steps of preparing carbondata files be handled in Spark, so "create table and alter table" would be handled in Spark.) In hive, only read(query). So can you explain a little more about what you mentioned that schema compatible with hive is for what part ?

2、Filter pushdown （especially partition filter FilterPushdownDev）
Liang : LGTM for this point.

3、A tool to update the existing tables' schema to be compatible with hive.
Liang : comment same as question1. can you give some examples for "the existing tables' schema".

For hive integration feature in Apache CarbonData 1.2.0, i propose the scope as below:
1.Only support read/query carbondata files in hive. write carbondata(create carbon table,alter carbon table, load data etc.) in hive will be supported in future.(new mailing list topic can be discussed for this plan)
2.Can utilize CarbonData's good features(index, dictionary...... ) to get good query performance. Hive+CarbonData performance should be better than Hive+ORC
3.Provide a solution/tool to migrate all hive tables&data to carbon tables&data in Spark.

Regards
Liang