Apache CarbonData Dev Mailing List archive

Regarding presto carbondata integration

Classic

List

Threaded

3 messages Options

Ajantha Bhat

Regarding presto carbondata integration

Hi all,

Currently master code of carbondata works with *prestodb 0.217*
We all know about competing *presto-sql* also.
Some of the users doesn't want to migrate to *presto-sql *as their cloud
vendor doesn't support presto sql (Example, AWS EMR, Huawei MRS, AZURE
services except HDInsights still comes with *presto db*)

So,
1. carbondata need to support both of them ?
2. carbondata need to maintain two modules ? one for prestodb, one for
prestosql, may be need to extract common code (big effort)
3. At a time carbondata can support only version of prestodb and
presto-sql. Every 15 days they release version and our integration is not
based on SPI (not as stand alone connector), we extended hive connector
interface. so, every few releases, carbondata and presto integration code
need to modify. This can be a bigger problem for maintenance.

And this is about read support, when we handle write support need to take
care about all the above points.

Thanks,
Ajantha

Jacky Li

Re: Regarding presto carbondata integration

> 2020年2月12日下午1:33，Ajantha Bhat <[hidden email]> 写道：
>
> Hi all,
>
> Currently master code of carbondata works with *prestodb 0.217*
> We all know about competing *presto-sql* also.
> Some of the users doesn't want to migrate to *presto-sql *as their cloud
> vendor doesn't support presto sql (Example, AWS EMR, Huawei MRS, AZURE
> services except HDInsights still comes with *presto db*)
>
> So,
> 1. carbondata need to support both of them ?

Yes, I think some user start to use prestosql already. PrestoSQL community is also quite active.

> 2. carbondata need to maintain two modules ? one for prestodb, one for
> prestosql, may be need to extract common code (big effort)

Yes, I am thinking the same after trying to adapter to PrestoSQL in last Dec. PrestoSQL has changed package name of some class. But most of our code should be in common for PrestoSQL and PrestoDB

> 3. At a time carbondata can support only version of prestodb and
> presto-sql. Every 15 days they release version and our integration is not
> based on SPI (not as stand alone connector), we extended hive connector
> interface. so, every few releases, carbondata and presto integration code
> need to modify. This can be a bigger problem for maintenance.

Have you analyzed if there is a way to use their formal developer API? This is indeed a problem for support future version smoothly

>
> And this is about read support, when we handle write support need to take
> care about all the above points.
>
> Thanks,
> Ajantha

akashrn5

Re: Regarding presto carbondata integration

In reply to this post by Ajantha Bhat

Hi Ajantha,

Whatever you mentioned is a big pain point now. Even when we are try for
write support, the hadoop and hive versions supported
in carbon version is different from what presto supports, so we might have
to have duplicate code for this case also. Either we have to
put carbon code in presto, which might take time, or we may have to put in
extra effort to refactor the presto integration code based on versions,
like we had for different spark versions.

Regards,
Akash

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/