Apache CarbonData Dev Mailing List archive - Re: [Discussion] Support Spark/Hive based partition in carbon

Apache CarbonData Dev Mailing List archive

Re: [Discussion] Support Spark/Hive based partition in carbon

Posted by ravipesala on Dec 09, 2017; 5:23pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Support-Spark-Hive-based-partition-in-carbon-tp27594p30111.html

Hi Yuhai Cen,

Yes you are right, we should support standard folder structure like hive to
generalize the fileformat but we have a lot of other features which are
built upon this folder structure. So removing of this will have a lot of
impact on those features. Right now we are implementing
CarbonTableOutputFormat which manages table segments while loading and
writes data in the current carbon folder structure. And one more
outputformat called CarbonOutputFormat and CarbonInputFormat which just
writes and reads the data to file which is totally managed by spark/hive,
so these interfaces will be the generalized fileformat interfaces to
integrate with systems like hive/presto.

Regards,
Ravindra.

On 9 December 2017 at 11:20, 岑玉海 <[hidden email]> wrote:

> I still insist that if we want to make carbon a general fileformt on
> hadoop ecosystem, we should support standard hive/spark folder structure.
>
>
> we can use the folder structure like this:
> TABLE_PATH
>
> Customer=US
>
> |--Segement_0
>
> |---0-12212.carbonindex
>
> |---PART-00-12212.carbondata
>
> |---0-34343.carbonindex
>
> |---PART-00-34343.carbondata
>
> or
> TABLE_PATH
>
> Customer=US
>
> |--Part0
>
> |--Fact
>
> |--Segement_0
>
> |---0-12212.carbonindex
>
> |---PART-00-12212.carbondata
>
> |---0-34343.carbonindex
>
> |---PART-00-34343.carbondata
>
>
>
>
>
>
>
>
>
> I know there will be some impact on compaction and segment management.
>
> @Jacky @Ravindra @chenliang @David CaiQiang can you estimate the impact?
>
>
>
> Best regards!
> Yuhai Cen
>
>
> 在2017年12月5日 15:29，Ravindra Pesala<[hidden email]> 写道：
> Hi Jacky,
>
> Here we have the main problem with the underlying segment based design of
> carbon. For every increment load carbon creates a segment and manages the
> segments through the tablestatus file. The changes will be very big and
> impact is more if we try to change this design. And also we will have a
> problem with backward compatibility when the folder structure changes in
> new loads.
>
> Regards,
> Ravindra.
>
> On 5 December 2017 at 10:12, 岑玉海 <[hidden email]> wrote:
>
> > Hi, Ravindra:
> > I read your design documents, why not use the standard hive/spark
> > folder structure, is there any problem if use the hive/spark folder
> > structure？
> >
> >
> >
> >
> >
> >
> >
> >
> > Best regards!
> > Yuhai Cen
> >
> >
> > 在2017年12月4日 14:09，Ravindra Pesala<[hidden email]> 写道：
> > Hi,
> >
> >
> > Please find the design document for standard partition support in carbon.
> > https://docs.google.com/document/d/1NJo_Qq4eovl7YRuT9O7yWTL0P378HnC8WT
> > 0-6pkQ7GQ/edit?usp=sharing
> >
> >
> >
> >
> >
> >
> >
> > Regards,
> > Ravindra.
> >
> >
> > On 27 November 2017 at 17:36, cenyuhai11 <[hidden email]> wrote:
> > The datasource api still have a problem that it do not support hybird
> > fileformat table.
> > Detail description about hybird fileformat table is in this issue:
> > https://issues.apache.org/jira/browse/CARBONDATA-1377.
> >
> > All partitions' fileformat of datasource table must be the same.
> > So we can't change fileformat to carbodata by command "alter table
> > table_xxx
> > set fileformat carbondata;"
> >
> > So I think implement TableReader is the right way.
> >
> >
> >
> >
> >
> >
> >
> > --
> > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> > n5.nabble.com/
> >
> >
> >
> >
> >
> >
> > --
> >
> > Thanks & Regards,
> > Ravi
> >
>
>
>
> --
> Thanks & Regards,
> Ravi
>

--
Thanks & Regards,
Ravi