[Discussion] Support Spark/Hive based partition in carbon

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discussion] Support Spark/Hive based partition in carbon

ravipesala
Partition features of Spark:

1. Creating table with partition
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name
    [(col_name1 col_type1 [COMMENT col_comment1], ...)]
    USING datasource
    [OPTIONS (key1=val1, key2=val2, ...)]
    [PARTITIONED BY (col_name1, col_name2, ...)]
    [TBLPROPERTIES (key1=val1, key2=val2, ...)]
    [AS select_statement]

2. Load data
  Static Partition

    LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
      INTO TABLE partitioned_user
      PARTITION (country = 'US', state = 'CA')

    INSERT OVERWRITE TABLE partitioned_user
      PARTITION (country = 'US', state = 'AL')
      SELECT * FROM another_user au
      WHERE au.country = 'US' AND au.state = 'AL';

   Dynamic Partition

    LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
      INTO TABLE partitioned_user
      PARTITION (country, state)

    INSERT OVERWRITE TABLE partitioned_user
      PARTITION (country, state)
      SELECT * FROM another_user;

 3. Drop, show partitions
  SHOW PARTITIONS [db_name.]table_name
  ALTER TABLE table_name DROP [IF EXISTS] (PARTITION part_spec, ...)

 4. Updating the partitions
  ALTER TABLE table_name PARTITION part_spec RENAME TO PARTITION part_spec



Currently, carbon supports the partitions which is custom implemented by
carbon. So if community users want to use the features which are available
in spark and hive in carbondata then there is a compatibility problem
arrives. And also carbondata does not have built in dynamic partition.
To use the partition feature of spark we should comply with the interfaces
available in spark while loading and reading the data.

Approach 1 :
Comply with pure spark datasource API and implement standard interfaces for
reading and writing of data at a file level.Just like how parquet and ORC
got implemented in spark carbondata also can be implemented in the same way.
To support it we need to implement a FileFormat interface for reading and
writing the data at filelevel, not table level. For reading, we should
implement CarbonFileInputFormat(Read data at the file level) and implement
CarbonOutputFormat(Writes data per partition.)
Pros :
1.It is the clean interface to use on spark, all features of spark can be
worked without any impact.
2.Upgrading from new versions of spark is straightforward and simple.
Cons:
All Carbondata features such as IUD, compaction, Alter table and data
management like show segments, delete segments etc cannot work.

Approach 2:
Improve and expand the current in-house partition features which already
exist in carbondata. Add all the missing features like dynamic partition
and comply the syntax of loading data to partitions.
Pros :
All current features of carbondata works without much impact.
Cons:
Current partition implementation does not comply to spark partition so need
to spend a lot of effort to implement it.

Approach 3:
It is the hybrid approach of 1st approach. Basically, write the data using
FileFormat and CarbonOutputFormat interfaces. So all the partition
information would be added to hive automatically since we are creating the
datasource table. We make sure that the current folder structure does not
change while writing the data.Here we maintain the mapping file inside
segment folder for mapping between the partition and carbonindex file.  And
while reading we first get the partition information from the hive and do
the pruning and based on the pruned partitions read the partition mapping
file to get the carbonindex for querying.
Here we will not support the current carbondata partition feature but we
support spark partition features.
Pros:
1. Support the standard interface for loading data. So features like
partition and bucketing automatically supported.
2. All standard SQL syntax works fine with this approach.
3. All current features of carbon also work fine.
Cons:
1.Existing partition feature cannot work.
2.Minor impact on features like compaction, IUD, clean files because of
maintaining the partition mapping file.

--
Thanks & Regards,
Ravindra
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Support Spark/Hive based partition in carbon

Jacky Li
Hi, I prefer the approach 3. If we use approach 3, hive, presto integration can also do partition pruning for carbon, right?

Regards,
Jacky

> 在 2017年11月21日,下午10:56,Ravindra Pesala <[hidden email]> 写道:
>
> Partition features of Spark:
>
> 1. Creating table with partition
> CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name
>    [(col_name1 col_type1 [COMMENT col_comment1], ...)]
>    USING datasource
>    [OPTIONS (key1=val1, key2=val2, ...)]
>    [PARTITIONED BY (col_name1, col_name2, ...)]
>    [TBLPROPERTIES (key1=val1, key2=val2, ...)]
>    [AS select_statement]
>
> 2. Load data
>  Static Partition
>
>    LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
>      INTO TABLE partitioned_user
>      PARTITION (country = 'US', state = 'CA')
>
>    INSERT OVERWRITE TABLE partitioned_user
>      PARTITION (country = 'US', state = 'AL')
>      SELECT * FROM another_user au
>      WHERE au.country = 'US' AND au.state = 'AL';
>
>   Dynamic Partition
>
>    LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
>      INTO TABLE partitioned_user
>      PARTITION (country, state)
>
>    INSERT OVERWRITE TABLE partitioned_user
>      PARTITION (country, state)
>      SELECT * FROM another_user;
>
> 3. Drop, show partitions
>  SHOW PARTITIONS [db_name.]table_name
>  ALTER TABLE table_name DROP [IF EXISTS] (PARTITION part_spec, ...)
>
> 4. Updating the partitions
>  ALTER TABLE table_name PARTITION part_spec RENAME TO PARTITION part_spec
>
>
> Currently, carbon supports the partitions which is custom implemented by
> carbon. So if community users want to use the features which are available
> in spark and hive in carbondata then there is a compatibility problem
> arrives. And also carbondata does not have built in dynamic partition.
> To use the partition feature of spark we should comply with the interfaces
> available in spark while loading and reading the data.
>
> Approach 1 :
> Comply with pure spark datasource API and implement standard interfaces for
> reading and writing of data at a file level.Just like how parquet and ORC
> got implemented in spark carbondata also can be implemented in the same way.
> To support it we need to implement a FileFormat interface for reading and
> writing the data at filelevel, not table level. For reading, we should
> implement CarbonFileInputFormat(Read data at the file level) and implement
> CarbonOutputFormat(Writes data per partition.)
> Pros :
> 1.It is the clean interface to use on spark, all features of spark can be
> worked without any impact.
> 2.Upgrading from new versions of spark is straightforward and simple.
> Cons:
> All Carbondata features such as IUD, compaction, Alter table and data
> management like show segments, delete segments etc cannot work.
>
> Approach 2:
> Improve and expand the current in-house partition features which already
> exist in carbondata. Add all the missing features like dynamic partition
> and comply the syntax of loading data to partitions.
> Pros :
> All current features of carbondata works without much impact.
> Cons:
> Current partition implementation does not comply to spark partition so need
> to spend a lot of effort to implement it.
>
> Approach 3:
> It is the hybrid approach of 1st approach. Basically, write the data using
> FileFormat and CarbonOutputFormat interfaces. So all the partition
> information would be added to hive automatically since we are creating the
> datasource table. We make sure that the current folder structure does not
> change while writing the data.Here we maintain the mapping file inside
> segment folder for mapping between the partition and carbonindex file.  And
> while reading we first get the partition information from the hive and do
> the pruning and based on the pruned partitions read the partition mapping
> file to get the carbonindex for querying.
> Here we will not support the current carbondata partition feature but we
> support spark partition features.
> Pros:
> 1. Support the standard interface for loading data. So features like
> partition and bucketing automatically supported.
> 2. All standard SQL syntax works fine with this approach.
> 3. All current features of carbon also work fine.
> Cons:
> 1.Existing partition feature cannot work.
> 2.Minor impact on features like compaction, IUD, clean files because of
> maintaining the partition mapping file.
>
> --
> Thanks & Regards,
> Ravindra



Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Support Spark/Hive based partition in carbon

cenyuhai11
In reply to this post by ravipesala
The datasource api still have a problem that it do not support hybird
fileformat table.
Detail description about hybird fileformat table is in this issue:
https://issues.apache.org/jira/browse/CARBONDATA-1377.

All partitions' fileformat of datasource table must be the same.
So we can't change fileformat to carbodata by command "alter table table_xxx
set fileformat carbondata;"

So I think implement TableReader is the right way.







--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Support Spark/Hive based partition in carbon

ravipesala
Hi,

Please find the design document for standard partition support in carbon.



Regards,
Ravindra.

On 27 November 2017 at 17:36, cenyuhai11 <[hidden email]> wrote:
The datasource api still have a problem that it do not support hybird
fileformat table.
Detail description about hybird fileformat table is in this issue:
https://issues.apache.org/jira/browse/CARBONDATA-1377.

All partitions' fileformat of datasource table must be the same.
So we can't change fileformat to carbodata by command "alter table table_xxx
set fileformat carbondata;"

So I think implement TableReader is the right way.







--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/



--
Thanks & Regards,
Ravi

Standard Partitioning Support in CarbonData.docx (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

回复: [Discussion] Support Spark/Hive based partition in carbon

cenyuhai11
Hi,  Ravindra:
   I read your design documents, why not use the standard hive/spark folder structure, is there any problem if use the hive/spark folder structure?








Best regards!
Yuhai Cen


在2017年12月4日 14:09,Ravindra Pesala<[hidden email]> 写道:
Hi,


Please find the design document for standard partition support in carbon.
https://docs.google.com/document/d/1NJo_Qq4eovl7YRuT9O7yWTL0P378HnC8WT0-6pkQ7GQ/edit?usp=sharing







Regards,
Ravindra.


On 27 November 2017 at 17:36, cenyuhai11 <[hidden email]> wrote:
The datasource api still have a problem that it do not support hybird
fileformat table.
Detail description about hybird fileformat table is in this issue:
https://issues.apache.org/jira/browse/CARBONDATA-1377.

All partitions' fileformat of datasource table must be the same.
So we can't change fileformat to carbodata by command "alter table table_xxx
set fileformat carbondata;"

So I think implement TableReader is the right way.







--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/






--

Thanks & Regards,
Ravi
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Support Spark/Hive based partition in carbon

ravipesala
Hi Jacky,

Here we have the main problem with the underlying segment based design of
carbon. For every increment load carbon creates a segment and manages the
segments through the tablestatus file. The changes will be very big and
impact is more if we try to change this design. And also we will have a
problem with backward compatibility when the folder structure changes in
new loads.

Regards,
Ravindra.

On 5 December 2017 at 10:12, 岑玉海 <[hidden email]> wrote:

> Hi,  Ravindra:
>    I read your design documents, why not use the standard hive/spark
> folder structure, is there any problem if use the hive/spark folder
> structure?
>
>
>
>
>
>
>
>
> Best regards!
> Yuhai Cen
>
>
> 在2017年12月4日 14:09,Ravindra Pesala<[hidden email]> 写道:
> Hi,
>
>
> Please find the design document for standard partition support in carbon.
> https://docs.google.com/document/d/1NJo_Qq4eovl7YRuT9O7yWTL0P378HnC8WT
> 0-6pkQ7GQ/edit?usp=sharing
>
>
>
>
>
>
>
> Regards,
> Ravindra.
>
>
> On 27 November 2017 at 17:36, cenyuhai11 <[hidden email]> wrote:
> The datasource api still have a problem that it do not support hybird
> fileformat table.
> Detail description about hybird fileformat table is in this issue:
> https://issues.apache.org/jira/browse/CARBONDATA-1377.
>
> All partitions' fileformat of datasource table must be the same.
> So we can't change fileformat to carbodata by command "alter table
> table_xxx
> set fileformat carbondata;"
>
> So I think implement TableReader is the right way.
>
>
>
>
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>
>
>
>
>
>
> --
>
> Thanks & Regards,
> Ravi
>



--
Thanks & Regards,
Ravi
Reply | Threaded
Open this post in threaded view
|

回复: [Discussion] Support Spark/Hive based partition in carbon

cenyuhai11
I still insist that if we want to make carbon a general fileformt on hadoop ecosystem, we should support standard hive/spark folder structure.


we can use the folder structure like this:
TABLE_PATH

Customer=US

                |--Segement_0

                         |---0-12212.carbonindex

                         |---PART-00-12212.carbondata

                         |---0-34343.carbonindex

                         |---PART-00-34343.carbondata

or
TABLE_PATH

Customer=US

      |--Part0

           |--Fact

                |--Segement_0

                         |---0-12212.carbonindex

                         |---PART-00-12212.carbondata

                         |---0-34343.carbonindex

                         |---PART-00-34343.carbondata









I know there will be some impact on compaction and segment management.

@Jacky @Ravindra @chenliang @David CaiQiang can you estimate the impact?



Best regards!
Yuhai Cen


在2017年12月5日 15:29,Ravindra Pesala<[hidden email]> 写道:
Hi Jacky,

Here we have the main problem with the underlying segment based design of
carbon. For every increment load carbon creates a segment and manages the
segments through the tablestatus file. The changes will be very big and
impact is more if we try to change this design. And also we will have a
problem with backward compatibility when the folder structure changes in
new loads.

Regards,
Ravindra.

On 5 December 2017 at 10:12, 岑玉海 <[hidden email]> wrote:

> Hi,  Ravindra:
>    I read your design documents, why not use the standard hive/spark
> folder structure, is there any problem if use the hive/spark folder
> structure?
>
>
>
>
>
>
>
>
> Best regards!
> Yuhai Cen
>
>
> 在2017年12月4日 14:09,Ravindra Pesala<[hidden email]> 写道:
> Hi,
>
>
> Please find the design document for standard partition support in carbon.
> https://docs.google.com/document/d/1NJo_Qq4eovl7YRuT9O7yWTL0P378HnC8WT
> 0-6pkQ7GQ/edit?usp=sharing
>
>
>
>
>
>
>
> Regards,
> Ravindra.
>
>
> On 27 November 2017 at 17:36, cenyuhai11 <[hidden email]> wrote:
> The datasource api still have a problem that it do not support hybird
> fileformat table.
> Detail description about hybird fileformat table is in this issue:
> https://issues.apache.org/jira/browse/CARBONDATA-1377.
>
> All partitions' fileformat of datasource table must be the same.
> So we can't change fileformat to carbodata by command "alter table
> table_xxx
> set fileformat carbondata;"
>
> So I think implement TableReader is the right way.
>
>
>
>
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>
>
>
>
>
>
> --
>
> Thanks & Regards,
> Ravi
>



--
Thanks & Regards,
Ravi
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Support Spark/Hive based partition in carbon

ravipesala
Hi Yuhai Cen,

Yes you are right, we should support standard folder structure like hive to
generalize the fileformat but we have a lot of other features which are
built upon this folder structure. So removing of this will have a lot of
impact on those features. Right now we are implementing
CarbonTableOutputFormat which manages table segments while loading and
writes data in the current carbon folder structure. And one more
outputformat called CarbonOutputFormat and CarbonInputFormat which just
writes and reads the data to file which is totally managed by spark/hive,
so these interfaces will be the generalized fileformat interfaces to
integrate with systems like hive/presto.

Regards,
Ravindra.

On 9 December 2017 at 11:20, 岑玉海 <[hidden email]> wrote:

> I still insist that if we want to make carbon a general fileformt on
> hadoop ecosystem, we should support standard hive/spark folder structure.
>
>
> we can use the folder structure like this:
> TABLE_PATH
>
> Customer=US
>
>                 |--Segement_0
>
>                          |---0-12212.carbonindex
>
>                          |---PART-00-12212.carbondata
>
>                          |---0-34343.carbonindex
>
>                          |---PART-00-34343.carbondata
>
> or
> TABLE_PATH
>
> Customer=US
>
>       |--Part0
>
>            |--Fact
>
>                 |--Segement_0
>
>                          |---0-12212.carbonindex
>
>                          |---PART-00-12212.carbondata
>
>                          |---0-34343.carbonindex
>
>                          |---PART-00-34343.carbondata
>
>
>
>
>
>
>
>
>
> I know there will be some impact on compaction and segment management.
>
> @Jacky @Ravindra @chenliang @David CaiQiang can you estimate the impact?
>
>
>
> Best regards!
> Yuhai Cen
>
>
> 在2017年12月5日 15:29,Ravindra Pesala<[hidden email]> 写道:
> Hi Jacky,
>
> Here we have the main problem with the underlying segment based design of
> carbon. For every increment load carbon creates a segment and manages the
> segments through the tablestatus file. The changes will be very big and
> impact is more if we try to change this design. And also we will have a
> problem with backward compatibility when the folder structure changes in
> new loads.
>
> Regards,
> Ravindra.
>
> On 5 December 2017 at 10:12, 岑玉海 <[hidden email]> wrote:
>
> > Hi,  Ravindra:
> >    I read your design documents, why not use the standard hive/spark
> > folder structure, is there any problem if use the hive/spark folder
> > structure?
> >
> >
> >
> >
> >
> >
> >
> >
> > Best regards!
> > Yuhai Cen
> >
> >
> > 在2017年12月4日 14:09,Ravindra Pesala<[hidden email]> 写道:
> > Hi,
> >
> >
> > Please find the design document for standard partition support in carbon.
> > https://docs.google.com/document/d/1NJo_Qq4eovl7YRuT9O7yWTL0P378HnC8WT
> > 0-6pkQ7GQ/edit?usp=sharing
> >
> >
> >
> >
> >
> >
> >
> > Regards,
> > Ravindra.
> >
> >
> > On 27 November 2017 at 17:36, cenyuhai11 <[hidden email]> wrote:
> > The datasource api still have a problem that it do not support hybird
> > fileformat table.
> > Detail description about hybird fileformat table is in this issue:
> > https://issues.apache.org/jira/browse/CARBONDATA-1377.
> >
> > All partitions' fileformat of datasource table must be the same.
> > So we can't change fileformat to carbodata by command "alter table
> > table_xxx
> > set fileformat carbondata;"
> >
> > So I think implement TableReader is the right way.
> >
> >
> >
> >
> >
> >
> >
> > --
> > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> > n5.nabble.com/
> >
> >
> >
> >
> >
> >
> > --
> >
> > Thanks & Regards,
> > Ravi
> >
>
>
>
> --
> Thanks & Regards,
> Ravi
>



--
Thanks & Regards,
Ravi
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Support Spark/Hive based partition in carbon

Jacky Li
In reply to this post by cenyuhai11
Hi Yuhai Cen,

As told by Ravindra, I think we need to have two OutputFormat finally.

1. CarbonTableOutputFormat
This is needed to maintain the segment structure of carbondata, and enable all segment related command for the partitioned table, such as Show Segments, Delete Segment, etc.

2. CarbonFileOutputFormat
This will write carbondata files directly to the partition folder without the segment folder, and the segment related command may not work in this case. This OutputFormat is an incremental effort based on CarbonTableOutputFormat work.

So now we are focusing on implementing CarbonTableOutputFormat, once it is done, CarbonFileOutputFormat can be added later.

Regards,
Jacky


> 在 2017年12月9日,下午1:50,岑玉海 <[hidden email]> 写道:
>
> I still insist that if we want to make carbon a general fileformt on hadoop ecosystem, we should support standard hive/spark folder structure.
>
> we can use the folder structure like this:
>   TABLE_PATH
>       Customer=US
>                  |--Segement_0
>                           |---0-12212.carbonindex
>                           |---PART-00-12212.carbondata
>                           |---0-34343.carbonindex
>                           |---PART-00-34343.carbondata
> or
> TABLE_PATH
>   Customer=US
>        |--Part0
>             |--Fact
>                  |--Segement_0
>                           |---0-12212.carbonindex
>                           |---PART-00-12212.carbondata
>                           |---0-34343.carbonindex
>                           |---PART-00-34343.carbondata
>
>
>
> I know there will be some impact on compaction and segment management.
> @Jacky @Ravindra @chenliang @David CaiQiang  can you estimate the impact?
>    
>
> Best regards!
> Yuhai Cen
>
> 在2017年12月5日 15:29,Ravindra Pesala<[hidden email]> <mailto:[hidden email]> 写道:
> Hi Jacky,
>
> Here we have the main problem with the underlying segment based design of
> carbon. For every increment load carbon creates a segment and manages the
> segments through the tablestatus file. The changes will be very big and
> impact is more if we try to change this design. And also we will have a
> problem with backward compatibility when the folder structure changes in
> new loads.
>
> Regards,
> Ravindra.
>
> On 5 December 2017 at 10:12, 岑玉海 <[hidden email]> wrote:
>
> > Hi,  Ravindra:
> >    I read your design documents, why not use the standard hive/spark
> > folder structure, is there any problem if use the hive/spark folder
> > structure?
> >
> >
> >
> >
> >
> >
> >
> >
> > Best regards!
> > Yuhai Cen
> >
> >
> > 在2017年12月4日 14:09,Ravindra Pesala<[hidden email]> 写道:
> > Hi,
> >
> >
> > Please find the design document for standard partition support in carbon.
> > https://docs.google.com/document/d/1NJo_Qq4eovl7YRuT9O7yWTL0P378HnC8WT 
> > 0-6pkQ7GQ/edit?usp=sharing
> >
> >
> >
> >
> >
> >
> >
> > Regards,
> > Ravindra.
> >
> >
> > On 27 November 2017 at 17:36, cenyuhai11 <[hidden email]> wrote:
> > The datasource api still have a problem that it do not support hybird
> > fileformat table.
> > Detail description about hybird fileformat table is in this issue:
> > https://issues.apache.org/jira/browse/CARBONDATA-1377.
> >
> > All partitions' fileformat of datasource table must be the same.
> > So we can't change fileformat to carbodata by command "alter table
> > table_xxx
> > set fileformat carbondata;"
> >
> > So I think implement TableReader is the right way.
> >
> >
> >
> >
> >
> >
> >
> > --
> > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> > n5.nabble.com/
> >
> >
> >
> >
> >
> >
> > --
> >
> > Thanks & Regards,
> > Ravi
> >
>
>
>
> --  
> Thanks & Regards,
> Ravi

Reply | Threaded
Open this post in threaded view
|

回复: [Discussion] Support Spark/Hive based partition in carbon

cenyuhai11
ok






Best regards!
Yuhai Cen


在2017年12月10日 11:17,Jacky Li<[hidden email]> 写道:
Hi Yuhai Cen,

As told by Ravindra, I think we need to have two OutputFormat finally.

1. CarbonTableOutputFormat
This is needed to maintain the segment structure of carbondata, and enable all segment related command for the partitioned table, such as Show Segments, Delete Segment, etc.

2. CarbonFileOutputFormat
This will write carbondata files directly to the partition folder without the segment folder, and the segment related command may not work in this case. This OutputFormat is an incremental effort based on CarbonTableOutputFormat work.

So now we are focusing on implementing CarbonTableOutputFormat, once it is done, CarbonFileOutputFormat can be added later.

Regards,
Jacky


> 在 2017年12月9日,下午1:50,岑玉海 <[hidden email]> 写道:
>
> I still insist that if we want to make carbon a general fileformt on hadoop ecosystem, we should support standard hive/spark folder structure.
>
> we can use the folder structure like this:
>   TABLE_PATH
>       Customer=US
>                  |--Segement_0
>                           |---0-12212.carbonindex
>                           |---PART-00-12212.carbondata
>                           |---0-34343.carbonindex
>                           |---PART-00-34343.carbondata
> or
> TABLE_PATH
>   Customer=US
>        |--Part0
>             |--Fact
>                  |--Segement_0
>                           |---0-12212.carbonindex
>                           |---PART-00-12212.carbondata
>                           |---0-34343.carbonindex
>                           |---PART-00-34343.carbondata
>
>
>
> I know there will be some impact on compaction and segment management.
> @Jacky @Ravindra @chenliang @David CaiQiang  can you estimate the impact?
>    
>
> Best regards!
> Yuhai Cen
>
> 在2017年12月5日 15:29,Ravindra Pesala<[hidden email]> <mailto:[hidden email]> 写道:
> Hi Jacky,
>
> Here we have the main problem with the underlying segment based design of
> carbon. For every increment load carbon creates a segment and manages the
> segments through the tablestatus file. The changes will be very big and
> impact is more if we try to change this design. And also we will have a
> problem with backward compatibility when the folder structure changes in
> new loads.
>
> Regards,
> Ravindra.
>
> On 5 December 2017 at 10:12, 岑玉海 <[hidden email]> wrote:
>
> > Hi,  Ravindra:
> >    I read your design documents, why not use the standard hive/spark
> > folder structure, is there any problem if use the hive/spark folder
> > structure?
> >
> >
> >
> >
> >
> >
> >
> >
> > Best regards!
> > Yuhai Cen
> >
> >
> > 在2017年12月4日 14:09,Ravindra Pesala<[hidden email]> 写道:
> > Hi,
> >
> >
> > Please find the design document for standard partition support in carbon.
> > https://docs.google.com/document/d/1NJo_Qq4eovl7YRuT9O7yWTL0P378HnC8WT 
> > 0-6pkQ7GQ/edit?usp=sharing
> >
> >
> >
> >
> >
> >
> >
> > Regards,
> > Ravindra.
> >
> >
> > On 27 November 2017 at 17:36, cenyuhai11 <[hidden email]> wrote:
> > The datasource api still have a problem that it do not support hybird
> > fileformat table.
> > Detail description about hybird fileformat table is in this issue:
> > https://issues.apache.org/jira/browse/CARBONDATA-1377.
> >
> > All partitions' fileformat of datasource table must be the same.
> > So we can't change fileformat to carbodata by command "alter table
> > table_xxx
> > set fileformat carbondata;"
> >
> > So I think implement TableReader is the right way.
> >
> >
> >
> >
> >
> >
> >
> > --
> > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> > n5.nabble.com/
> >
> >
> >
> >
> >
> >
> > --
> >
> > Thanks & Regards,
> > Ravi
> >
>
>
>
> --  
> Thanks & Regards,
> Ravi