Apache CarbonData Dev Mailing List archive

[DISCUSSION] Presto+Carbon transactional and Non-transactional Write Support

Classic

List

Threaded

5 messages Options

akashnilugal@gmail.com

[DISCUSSION] Presto+Carbon transactional and Non-transactional Write Support

Hi Community,

As we know the CarbonDataisan indexed columnar data format for fast
analytics on big data platforms. So
we have already integrated with the query engines like spark and even
presto. Currently with presto we
only support the querying of carbon data files. But we don’t yet support
the writing of carbon data files
through presto engine.

Currentlypresto is integrated with carbondata for reading the carbondata
files via presto.
For this, we should be having the store already ready which may be written
carbon in spark and the table
should be hive metastore. So using carbondata connector we are able to read
the carbondata files. But we
cannot create a table or load the data to the table in presto. So it will
somewhat hectic job to read the
carbon files, by writing first with other engines.

So here I will be trying to support the transactional load support in
presto integration for carbon.

I have attached the design document in the Jira, please refer and any
suggestions or input is most welcome.

https://issues.apache.org/jira/browse/CARBONDATA-3831

Regards,
Akash R.

kunalkapoor

Re: [DISCUSSION] Presto+Carbon transactional and Non-transactional Write Support

+1,
It would be great to have write support from presto

Thanks
Kunal Kapoor

On Tue, Jul 14, 2020 at 6:08 PM Akash Nilugal <[hidden email]>
wrote:

> Hi Community,
>
> As we know the CarbonDataisan indexed columnar data format for fast
> analytics on big data platforms. So
> we have already integrated with the query engines like spark and even
> presto. Currently with presto we
> only support the querying of carbon data files. But we don’t yet support
> the writing of carbon data files
> through presto engine.
>
> Currentlypresto is integrated with carbondata for reading the carbondata
> files via presto.
> For this, we should be having the store already ready which may be written
> carbon in spark and the table
> should be hive metastore. So using carbondata connector we are able to read
> the carbondata files. But we
> cannot create a table or load the data to the table in presto. So it will
> somewhat hectic job to read the
> carbon files, by writing first with other engines.
>
> So here I will be trying to support the transactional load support in
> presto integration for carbon.
>
> I have attached the design document in the Jira, please refer and any
> suggestions or input is most welcome.
>
> https://issues.apache.org/jira/browse/CARBONDATA-3831
>
>
> Regards,
> Akash R.
>

Ajantha Bhat

Re: [DISCUSSION] Presto+Carbon transactional and Non-transactional Write Support

+1,

I have some suggestions and questions,

a) you mentioned, currently creating a table form presto and inserting data
will be a non-transactional table.
so, to create a transactional table, we still depend on spark ?
*I feel we should support transactional table creation with all
table properties also from presto (to remove dependency on spark). *

b) If spark has created global sort table and presto inserts data. what
will happen ? Do we ignore those table properties in presto write ?

c) Partition and complex type can be handled as immediate follow up of this
as it is a very common feature now a days.

Thanks,
Ajantha

On Mon, 27 Jul, 2020, 1:07 pm Kunal Kapoor, <[hidden email]>
wrote:

> +1,
> It would be great to have write support from presto
>
> Thanks
> Kunal Kapoor
>
> On Tue, Jul 14, 2020 at 6:08 PM Akash Nilugal <[hidden email]>
> wrote:
>
> > Hi Community,
> >
> > As we know the CarbonDataisan indexed columnar data format for fast
> > analytics on big data platforms. So
> > we have already integrated with the query engines like spark and even
> > presto. Currently with presto we
> > only support the querying of carbon data files. But we don’t yet support
> > the writing of carbon data files
> > through presto engine.
> >
> > Currentlypresto is integrated with carbondata for reading the carbondata
> > files via presto.
> > For this, we should be having the store already ready which may be
> written
> > carbon in spark and the table
> > should be hive metastore. So using carbondata connector we are able to
> read
> > the carbondata files. But we
> > cannot create a table or load the data to the table in presto. So it will
> > somewhat hectic job to read the
> > carbon files, by writing first with other engines.
> >
> > So here I will be trying to support the transactional load support in
> > presto integration for carbon.
> >
> > I have attached the design document in the Jira, please refer and any
> > suggestions or input is most welcome.
> >
> > https://issues.apache.org/jira/browse/CARBONDATA-3831
> >
> >
> > Regards,
> > Akash R.
> >
>

VenuReddy

Re: [DISCUSSION] Presto+Carbon transactional and Non-transactional Write Support

In reply to this post by akashnilugal@gmail.com

+1

Good to enhance Carbon with Presto engine support.

Regards,
Venu

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

akashrn5

Re: [DISCUSSION] Presto+Carbon transactional and Non-transactional Write Support

In reply to this post by Ajantha Bhat

Hi Ajantha,

Thanks for the inputs, please check the comments.

a) you mentioned, currently creating a table form presto and inserting data
will be a non-transactional table.
so, to create a transactional table, we still depend on spark?
====> currently it's dependent on spark, but I'm planning to support create
table from presto and remove the dependency
I will start once I finish this requirement.

b) If spark has created a global sort table and presto inserts data. what
will happen? Do we ignore those table properties in presto write?
====> Currenlty we are ignoring the table properties and considering only
the system-level properties. Once we
support create table, we can consider these.

c) Partition and complex type can be handled as immediate follow up of this
as it is a very common feature now a days.
====> This is already planned, so I will create subtasks in Jira to follow
up with these.

Thanks,

Regards,
Akash R

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/