Apache CarbonData Dev Mailing List archive

[carbondata-presto enhancements] support reading carbon SDK writer output in presto

Classic

List

Threaded

6 messages Options

Ajantha Bhat

[carbondata-presto enhancements] support reading carbon SDK writer output in presto

Currently, carbon SDK files output (files without metadata folder and its
contents) are read by spark using an external table with carbon session.
But presto carbon integration doesn't support that. It can currently read
only the transactional table output files.

Hence we can enhance presto to read SDK output files. This will increase
the use cases for presto-carbon integration.

The above scenario can be achieved by inferring schema if metadata folder
not exists and
setting read committed scope to LatestFilesReadCommittedScope, if
non-transctional table output files are present.

Thanks,
Ajantha

Jacky Li

Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

Hi Ajantha,

Currently for carbon-presto integration, there is a plugin called “carbondata”. I wonder will you introduce new plugin into the project?
I suggest we re-use the same plugin and decide the read path within the plugin.
What do you think?

Regards,
Jacky

> 在 2018年12月10日，下午2:31，Ajantha Bhat <[hidden email]> 写道：
>
> Currently, carbon SDK files output (files without metadata folder and its
> contents) are read by spark using an external table with carbon session.
> But presto carbon integration doesn't support that. It can currently read
> only the transactional table output files.
>
> Hence we can enhance presto to read SDK output files. This will increase
> the use cases for presto-carbon integration.
>
> The above scenario can be achieved by inferring schema if metadata folder
> not exists and
> setting read committed scope to LatestFilesReadCommittedScope, if
> non-transctional table output files are present.
>
>
> Thanks,
> Ajantha
>

ravipesala

Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

+1

Yes Jacky, he is not going add any new plugin. Depending on the folder
structure and table status he considers whether it is transactional or
non-transactional inside the same plugin. PR
https://github.com/apache/carbondata/pull/2982/ already raised for it.

Regards,
Ravindra.

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Jacky Li

Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

Thanks.
Can we do the same for spark integration also, I see there are two datasource now: “carbon” and “carbondata”
It is not easy for user to differentiate when to use which one.

Since we are discussing “support transactional table in SDK”, so I think we can make unify “carbon” and “carbondata”, for example, we can make “carbondata” is an alias to “carbon”. I prefer this way since “carbon” is shorter :)

What do you think?

Regards,
Jacky

> 在 2018年12月10日，下午11:18，ravipesala <[hidden email]> 写道：
>
> +1
>
> Yes Jacky, he is not going add any new plugin. Depending on the folder
> structure and table status he considers whether it is transactional or
> non-transactional inside the same plugin. PR
> https://github.com/apache/carbondata/pull/2982/ already raised for it.
>
> Regards,
> Ravindra.
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

xubo245

Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

+1, It will better if we can unify "carbon" and "carbondata",
SparkCarbonFileFormat uses carbon and SparkCarbonTableFormat use carbondata.
SDK should support transactional table and non-transactional table.
DataFrame also should support different type carbon data.

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

ravipesala

Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

In reply to this post by Jacky Li

Hi Jacky,

In spark integration we have two approaches one with very deep integration
and one with shallow integration using the sparks fileformat. One with deep
integration we use the datasource name as carbondata, this name also
registered to java services so anything which comes with this datasource
name uses the deep integration path.
Another with shallow integration we use datasource name as carbon and
extracted this to spark-datasource module. So any table with this carbon
datasource name comes to the fileformat flow.

This datasource names are nothing do with transactional and non
transactional. It is about the spark datasource implementations. Basically I
am trying to tell is with carbondata datasource can read both transactional
and non transactional data. We introduced carbon datasource name only for
the sake of spark to identify the type of implementation flow it should
choose.

Regards,
Ravindra.

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/