Currently, carbon SDK files output (files without metadata folder and its
contents) are read by spark using an external table with carbon session. But presto carbon integration doesn't support that. It can currently read only the transactional table output files. Hence we can enhance presto to read SDK output files. This will increase the use cases for presto-carbon integration. The above scenario can be achieved by inferring schema if metadata folder not exists and setting read committed scope to LatestFilesReadCommittedScope, if non-transctional table output files are present. Thanks, Ajantha |
Hi Ajantha,
Currently for carbon-presto integration, there is a plugin called “carbondata”. I wonder will you introduce new plugin into the project? I suggest we re-use the same plugin and decide the read path within the plugin. What do you think? Regards, Jacky > 在 2018年12月10日,下午2:31,Ajantha Bhat <[hidden email]> 写道: > > Currently, carbon SDK files output (files without metadata folder and its > contents) are read by spark using an external table with carbon session. > But presto carbon integration doesn't support that. It can currently read > only the transactional table output files. > > Hence we can enhance presto to read SDK output files. This will increase > the use cases for presto-carbon integration. > > The above scenario can be achieved by inferring schema if metadata folder > not exists and > setting read committed scope to LatestFilesReadCommittedScope, if > non-transctional table output files are present. > > > Thanks, > Ajantha > |
+1
Yes Jacky, he is not going add any new plugin. Depending on the folder structure and table status he considers whether it is transactional or non-transactional inside the same plugin. PR https://github.com/apache/carbondata/pull/2982/ already raised for it. Regards, Ravindra. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Thanks.
Can we do the same for spark integration also, I see there are two datasource now: “carbon” and “carbondata” It is not easy for user to differentiate when to use which one. Since we are discussing “support transactional table in SDK”, so I think we can make unify “carbon” and “carbondata”, for example, we can make “carbondata” is an alias to “carbon”. I prefer this way since “carbon” is shorter :) What do you think? Regards, Jacky > 在 2018年12月10日,下午11:18,ravipesala <[hidden email]> 写道: > > +1 > > Yes Jacky, he is not going add any new plugin. Depending on the folder > structure and table status he considers whether it is transactional or > non-transactional inside the same plugin. PR > https://github.com/apache/carbondata/pull/2982/ already raised for it. > > Regards, > Ravindra. > > > -- > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
+1, It will better if we can unify "carbon" and "carbondata",
SparkCarbonFileFormat uses carbon and SparkCarbonTableFormat use carbondata. SDK should support transactional table and non-transactional table. DataFrame also should support different type carbon data. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by Jacky Li
Hi Jacky,
In spark integration we have two approaches one with very deep integration and one with shallow integration using the sparks fileformat. One with deep integration we use the datasource name as carbondata, this name also registered to java services so anything which comes with this datasource name uses the deep integration path. Another with shallow integration we use datasource name as carbon and extracted this to spark-datasource module. So any table with this carbon datasource name comes to the fileformat flow. This datasource names are nothing do with transactional and non transactional. It is about the spark datasource implementations. Basically I am trying to tell is with carbondata datasource can read both transactional and non transactional data. We introduced carbon datasource name only for the sake of spark to identify the type of implementation flow it should choose. Regards, Ravindra. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |