This post was updated on .
This feature will support the carbondata SDK to delete and update data from
carbondata files. Details of solution and implementation are mentioned in the document attached to JIRA. https://issues.apache.org/jira/browse/CARBONDATA-3865 Thanks Karanpreet Singh -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
+1。
This is neccsarry requirement for users. Suggestion: change CarbonSDKUID to common name. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by Karan-c980
+1
But it should be part of CarbonOutputFormat, not just SDK. we are planning to implement even for simpler updates from spark. SDK should call outputformat to update/delete the records. Please @[hidden email] <[hidden email]> comment on it, we already had a discussion on it. Regards, Ravindra. On Tue, 23 Jun 2020 at 02:15, Karan-c980 <[hidden email]> wrote: > This feature will support the carbondata SDK to delete and update data from > carbondata files. > > Details of solution and implementation are mentioned in the document > attached to JIRA. > https://issues.apache.org/jira/browse/CARBONDATA-3865 > > Thanks > Karanpreet Singh > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > -- Thanks & Regards, Ravi |
In reply to this post by Karan-c980
+ 1
I have a question. For each update and delete operation, carbon will create a delta file to keep deleted row ids. For sequential update and delete operation using CarbonSDK, will these delta files will be compacted to single delta file using horizontal compaction? Regards, Indhumathi -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by ravipesala
+1
As Ravindra said, we should do it at outputFormat level, so that the same implementation will be used to improve the simpler update performance in the future. So design should be like SDK will call update and delete of Outputformat to do the operations. Regards, Akash On Tue, Jun 23, 2020 at 5:17 PM Ravindra Pesala <[hidden email]> wrote: > +1 > But it should be part of CarbonOutputFormat, not just SDK. we are planning > to implement even for simpler updates from spark. SDK should call > outputformat to update/delete the records. > Please @[hidden email] <[hidden email]> comment on it, we > already had a discussion on it. > > Regards, > Ravindra. > > On Tue, 23 Jun 2020 at 02:15, Karan-c980 <[hidden email]> > wrote: > > > This feature will support the carbondata SDK to delete and update data > from > > carbondata files. > > > > Details of solution and implementation are mentioned in the document > > attached to JIRA. > > https://issues.apache.org/jira/browse/CARBONDATA-3865 > > > > Thanks > > Karanpreet Singh > > > > > > > > -- > > Sent from: > > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > > > > > -- > Thanks & Regards, > Ravi > |
In reply to this post by ravipesala
Hi Ravi,
Thanks for suggesting this change. We will add an API to CarbonTableOutputFormat to get DeleteDeltaRecordWriter which should be called from SDK. Please have a look at updated document in jira. Thanks, Karanpreet Singh -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by Karan-c980
+1
Have a small query regaridng this update API - public CarbonSDKUID update(String path, String column, String value, String updColumn, String updValue); I believe column argument is column to be matched for the given value argument. so it is matchColumn & matchValue. Upon match we update updColumn with the given updValue argument. Question is why not we have map of updateColumnToValue ? I mean, like the one similar to another update API(public void update(String path, Expression expression, Map<String, String> columnToValue);) that you have added in it ? Thanks, Venu -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi Venu,
In public CarbonSDKUID update(String path, String column, String value, String updColumn, String updValue); Api. We will preapre filterExpression from arguments column and value and updateColumnToValue mapping from arguments updColumn and updValue. After preparing this information we will call this API (public void update(String path, Expression expression, Map<String, String> columnToValue)) internally. User can directly call this API (public void update(String path, Expression expression, Map<String, String> columnToValue)) by providing filterExpression and UpdateMapping or he can just pass the column name and value and we will prepare this information internally. Thanks, Karan -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by Karan-c980
+1
Can we add a commit method to support multiple operations at once? CarbonSDKUID .delete(...) .delete(...) .update(...) .commit ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
Free forum by Nabble | Edit this page |