[DISCUSSION] Support heterogeneous format segments in carbondata

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSSION] Support heterogeneous format segments in carbondata

ravipesala
Hi All,

 This discussion is regarding support of other formats in carbon. Already
existing customers use other formats like parquet, orc etc., but if they
want to migrate to carbon there is no proper solution at hand. So this
feature allows all the old data to add as a segment to carbondata .  And
during query, it reads old data in its respective format and all new
segments will be read in carbon.

I have created the design document and attached to the jira. Please review
it.
https://issues.apache.org/jira/browse/CARBONDATA-3516


--
Thanks & Regards,
Ravindra
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support heterogeneous format segments in carbondata

chetdb
Hi Ravi,

1. What are the data formats that shall be supported to add segment. ?
2. Will the alter table be supported after loading multiple segments each having different data format.
3. If user wants to execute select query from certain segments only using set segments feature will he/she able to do so now after this feature implementation?
4. Will the index files be created for the segments created from external formats. If yes will the merge index feature be supported.?

Regards
Chetan

On 2019/09/10 14:41:22, Ravindra Pesala <[hidden email]> wrote:

> Hi All,
>
>  This discussion is regarding support of other formats in carbon. Already
> existing customers use other formats like parquet, orc etc., but if they
> want to migrate to carbon there is no proper solution at hand. So this
> feature allows all the old data to add as a segment to carbondata .  And
> during query, it reads old data in its respective format and all new
> segments will be read in carbon.
>
> I have created the design document and attached to the jira. Please review
> it.
> https://issues.apache.org/jira/browse/CARBONDATA-3516
>
>
> --
> Thanks & Regards,
> Ravindra
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support heterogeneous format segments in carbondata

xuchuanyin
In reply to this post by ravipesala
Hi, ravipesala, previously I have a similar proposal, please check if this
can make any help:
https://gist.github.com/xuchuanyin/cb264f2d7e94d6e185a55ea962e91ce1

Besides, for the problem in your proposal, the user can create a
`table_with_old_format_data` and create another `table_with_new_format_data`
and then create a `joint_table` union both tables. All the queries are fired
on the `joint_table`. ---- problem SOLVED...



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support heterogeneous format segments in carbondata

Jacky Li-3
In reply to this post by chetdb
IMHO

On 2019/09/11 06:46:21, chetan bhat <[hidden email]> wrote:
> Hi Ravi,
>
> 1. What are the data formats that shall be supported to add segment. ?
I think for the first phase we can target the tables that user may want to migrate to carbon, like orc and parquet tables. In future, we can consider CSV also.

> 2. Will the alter table be supported after loading multiple segments each having different data format.
Since this feature is only target for migrating the legacy table, I think we should keep it simple. So, no.

> 3. If user wants to execute select query from certain segments only using set segments feature will he/she able to do so now after this feature implementation?
Yes, I think it should be supported

> 4. Will the index files be created for the segments created from external formats. If yes will the merge index feature be supported.?
Same as query 1, no.

>
> Regards
> Chetan
>
> On 2019/09/10 14:41:22, Ravindra Pesala <[hidden email]> wrote:
> > Hi All,
> >
> >  This discussion is regarding support of other formats in carbon. Already
> > existing customers use other formats like parquet, orc etc., but if they
> > want to migrate to carbon there is no proper solution at hand. So this
> > feature allows all the old data to add as a segment to carbondata .  And
> > during query, it reads old data in its respective format and all new
> > segments will be read in carbon.
> >
> > I have created the design document and attached to the jira. Please review
> > it.
> > https://issues.apache.org/jira/browse/CARBONDATA-3516
> >
> >
> > --
> > Thanks & Regards,
> > Ravindra
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support heterogeneous format segments in carbondata

akashnilugal@gmail.com
In reply to this post by ravipesala
Hi

+1
One question is , is add segment and load data to main table supported? If yes, how the segment locking thing is handled? as we are going to add an entry inside table status with a segment id for added segment.

Regards,
Akash

On 2019/09/10 14:41:22, Ravindra Pesala <[hidden email]> wrote:

> Hi All,
>
>  This discussion is regarding support of other formats in carbon. Already
> existing customers use other formats like parquet, orc etc., but if they
> want to migrate to carbon there is no proper solution at hand. So this
> feature allows all the old data to add as a segment to carbondata .  And
> during query, it reads old data in its respective format and all new
> segments will be read in carbon.
>
> I have created the design document and attached to the jira. Please review
> it.
> https://issues.apache.org/jira/browse/CARBONDATA-3516
>
>
> --
> Thanks & Regards,
> Ravindra
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support heterogeneous format segments in carbondata

kunalkapoor
+1

On Mon, Sep 30, 2019, 2:44 PM Akash Nilugal <[hidden email]> wrote:

> Hi
>
> +1
> One question is , is add segment and load data to main table supported? If
> yes, how the segment locking thing is handled? as we are going to add an
> entry inside table status with a segment id for added segment.
>
> Regards,
> Akash
>
> On 2019/09/10 14:41:22, Ravindra Pesala <[hidden email]> wrote:
> > Hi All,
> >
> >  This discussion is regarding support of other formats in carbon. Already
> > existing customers use other formats like parquet, orc etc., but if they
> > want to migrate to carbon there is no proper solution at hand. So this
> > feature allows all the old data to add as a segment to carbondata .  And
> > during query, it reads old data in its respective format and all new
> > segments will be read in carbon.
> >
> > I have created the design document and attached to the jira. Please
> review
> > it.
> > https://issues.apache.org/jira/browse/CARBONDATA-3516
> >
> >
> > --
> > Thanks & Regards,
> > Ravindra
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support heterogeneous format segments in carbondata

kumarvishal09
+1
Regards
Kumar Vishal

On Mon, Oct 7, 2019 at 10:24 AM Kunal Kapoor <[hidden email]>
wrote:

> +1
>
> On Mon, Sep 30, 2019, 2:44 PM Akash Nilugal <[hidden email]>
> wrote:
>
> > Hi
> >
> > +1
> > One question is , is add segment and load data to main table supported?
> If
> > yes, how the segment locking thing is handled? as we are going to add an
> > entry inside table status with a segment id for added segment.
> >
> > Regards,
> > Akash
> >
> > On 2019/09/10 14:41:22, Ravindra Pesala <[hidden email]> wrote:
> > > Hi All,
> > >
> > >  This discussion is regarding support of other formats in carbon.
> Already
> > > existing customers use other formats like parquet, orc etc., but if
> they
> > > want to migrate to carbon there is no proper solution at hand. So this
> > > feature allows all the old data to add as a segment to carbondata .
> And
> > > during query, it reads old data in its respective format and all new
> > > segments will be read in carbon.
> > >
> > > I have created the design document and attached to the jira. Please
> > review
> > > it.
> > > https://issues.apache.org/jira/browse/CARBONDATA-3516
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Ravindra
> > >
> >
>
kumar vishal