Apache CarbonData Dev Mailing List archive - 回复： [Discussion] Support Spark/Hive based partition in carbon

Apache CarbonData Dev Mailing List archive

回复： [Discussion] Support Spark/Hive based partition in carbon

Posted by cenyuhai11 on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Support-Spark-Hive-based-partition-in-carbon-tp27594p30106.html

I still insist that if we want to make carbon a general fileformt on hadoop ecosystem, we should support standard hive/spark folder structure.

we can use the folder structure like this:
TABLE_PATH

Customer=US

|--Segement_0

|---0-12212.carbonindex

|---PART-00-12212.carbondata

|---0-34343.carbonindex

|---PART-00-34343.carbondata

or
TABLE_PATH

Customer=US

|--Part0

|--Fact

|--Segement_0

|---0-12212.carbonindex

|---PART-00-12212.carbondata

|---0-34343.carbonindex

|---PART-00-34343.carbondata

I know there will be some impact on compaction and segment management.

@Jacky @Ravindra @chenliang @David CaiQiang can you estimate the impact?

Best regards!
Yuhai Cen

在2017年12月5日 15:29，Ravindra Pesala<[hidden email]> 写道：
Hi Jacky,

Here we have the main problem with the underlying segment based design of
carbon. For every increment load carbon creates a segment and manages the
segments through the tablestatus file. The changes will be very big and
impact is more if we try to change this design. And also we will have a
problem with backward compatibility when the folder structure changes in
new loads.

Regards,
Ravindra.

On 5 December 2017 at 10:12, 岑玉海 <[hidden email]> wrote:

> Hi, Ravindra:
> I read your design documents, why not use the standard hive/spark
> folder structure, is there any problem if use the hive/spark folder
> structure？
>
>
>
>
>
>
>
>
> Best regards!
> Yuhai Cen
>
>
> 在2017年12月4日 14:09，Ravindra Pesala<[hidden email]> 写道：
> Hi,
>
>
> Please find the design document for standard partition support in carbon.
> https://docs.google.com/document/d/1NJo_Qq4eovl7YRuT9O7yWTL0P378HnC8WT
> 0-6pkQ7GQ/edit?usp=sharing
>
>
>
>
>
>
>
> Regards,
> Ravindra.
>
>
> On 27 November 2017 at 17:36, cenyuhai11 <[hidden email]> wrote:
> The datasource api still have a problem that it do not support hybird
> fileformat table.
> Detail description about hybird fileformat table is in this issue:
> https://issues.apache.org/jira/browse/CARBONDATA-1377.
>
> All partitions' fileformat of datasource table must be the same.
> So we can't change fileformat to carbodata by command "alter table
> table_xxx
> set fileformat carbondata;"
>
> So I think implement TableReader is the right way.
>
>
>
>
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>
>
>
>
>
>
> --
>
> Thanks & Regards,
> Ravi
>

--
Thanks & Regards,
Ravi