[carbondata-presto enhancements] support reading carbon SDK writer output in presto

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[carbondata-presto enhancements] support reading carbon SDK writer output in presto

Ajantha Bhat
Currently, carbon SDK files output (files without metadata folder and its
contents) are read by spark using an external table with carbon session.
But presto carbon integration doesn't support that. It can currently read
only the transactional table output files.

Hence we can enhance presto to read SDK output files. This will increase
the use cases for presto-carbon integration.

The above scenario can be achieved by inferring schema if metadata folder
not exists and
setting read committed scope to LatestFilesReadCommittedScope, if
non-transctional table output files are present.


Thanks,
Ajantha
Reply | Threaded
Open this post in threaded view
|

Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

Jacky Li
Hi Ajantha,

Currently for carbon-presto integration, there is a plugin called “carbondata”. I wonder will you introduce new plugin into the project?
I suggest we re-use the same plugin and decide the read path within the plugin.
What do you think?

Regards,
Jacky


> 在 2018年12月10日,下午2:31,Ajantha Bhat <[hidden email]> 写道:
>
> Currently, carbon SDK files output (files without metadata folder and its
> contents) are read by spark using an external table with carbon session.
> But presto carbon integration doesn't support that. It can currently read
> only the transactional table output files.
>
> Hence we can enhance presto to read SDK output files. This will increase
> the use cases for presto-carbon integration.
>
> The above scenario can be achieved by inferring schema if metadata folder
> not exists and
> setting read committed scope to LatestFilesReadCommittedScope, if
> non-transctional table output files are present.
>
>
> Thanks,
> Ajantha
>

Reply | Threaded
Open this post in threaded view
|

Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

ravipesala
+1

Yes Jacky, he is not going add any new plugin. Depending on the folder
structure and table status he considers whether it is transactional or
non-transactional inside the same plugin. PR
https://github.com/apache/carbondata/pull/2982/ already raised for it.

Regards,
Ravindra.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

Jacky Li
Thanks.
Can we do the same for spark integration also, I see there are two datasource now:  “carbon” and “carbondata”
It is not easy for user to differentiate when to use which one.

Since we are discussing “support transactional table in SDK”, so I think we can make unify “carbon” and “carbondata”, for example, we can make “carbondata” is an alias to “carbon”. I prefer this way since “carbon” is shorter :)

What do you think?

Regards,
Jacky

> 在 2018年12月10日,下午11:18,ravipesala <[hidden email]> 写道:
>
> +1
>
> Yes Jacky, he is not going add any new plugin. Depending on the folder
> structure and table status he considers whether it is transactional or
> non-transactional inside the same plugin. PR
> https://github.com/apache/carbondata/pull/2982/ already raised for it.
>
> Regards,
> Ravindra.
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>



Reply | Threaded
Open this post in threaded view
|

Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

xubo245
+1, It will better if we can unify "carbon" and "carbondata",
SparkCarbonFileFormat uses carbon and SparkCarbonTableFormat use carbondata.
SDK should support transactional table and non-transactional table.
DataFrame also should support different type carbon data.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

ravipesala
In reply to this post by Jacky Li
Hi Jacky,

In spark integration we have two approaches one with very deep integration
and one with shallow integration using the sparks fileformat. One with deep
integration we use the datasource name as carbondata, this name also
registered to java services so anything which comes with this datasource
name uses the deep integration path.
Another with shallow integration we use datasource name as carbon and
extracted this to spark-datasource module. So any table with this carbon
datasource name comes to the fileformat flow.

This datasource names are nothing do with transactional and non
transactional. It is about the spark datasource implementations. Basically I
am trying to tell is with carbondata datasource can read both transactional
and non transactional data. We introduced carbon datasource name only for
the sake of spark to identify the type of implementation flow it should
choose.


Regards,
Ravindra.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/