Hi All, In order to support application integration without central coordinator like Flink and Kafka Stream, transaction table need to be supported in SDK, and a new type of segment called Online Segment is proposed. Since it is hard to describe the motivation and design in a good format in the mail, I have attached a document in CARBONDATA-3152. Please review the doc and provide your feedback. https://issues.apache.org/jira/browse/CARBONDATA-3152 Regards, Jacky |
Hi Jacky,
Its a good idea to support writing transactional table from SDK. But we need to add following limitations as well 1. It can work on file systems which can take append lock like HDFS. 2. Compaction, delete segment cannot be done on online segments till it is converted to the transactional segment. 3. SDK writer should be responsible to add complete carbondata file to online segment once the writing is done, it should not add any half cooked data. And also as we are trying to updating the tablestatus from other modules like SDK , we better consider the segment interface first. Please go through the jira https://issues.apache.org/jira/projects/CARBONDATA/issues/CARBONDATA-2827 Regards, Ravindra -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Administrator
|
In reply to this post by Jacky Li
Hi
Good idea, thank you started this discussion. Agree with Ravi comments, we need to double-check some limitations after introducing the feature. Flink and Kafka integration can be discussed later. For using SDK to write new data to the existing carbondata table , some questions: 1.How to ensure to create the same index, dictionary... policy as per the existing table? 2.Can you please help me to understand this proposal further : what valued scenarios require this feature? ------------------------------------------------------------------------------------------------ After having online segment, one can use this feature to implement ApacheFlink-CarbonData integration, or Apache KafkaStream-CarbonDataintegration, or just using SDK to write new data to existing CarbonData table,the integration level can be the same as current Spark-CarbonDataintegration. Regards Liang -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by Jacky Li
Hi Jacky,
Carbon should support transactional table in SDK before ApacheFlink-Carbondata Integration.After having online segment, I can use this feature to implement ApacheFlink-CarbonData integration.Therefore, can I participate in the development of this feature,facilitating the integration of ApacheFlink-CarbonData integration feature? -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by ravipesala
> 在 2018年12月7日,下午11:05,ravipesala <[hidden email]> 写道: > > Hi Jacky, > > Its a good idea to support writing transactional table from SDK. But we need > to add following limitations as well > 1. It can work on file systems which can take append lock like HDFS. Likun: yes, since we need to overwrite table status file, we need file locking. > 2. Compaction, delete segment cannot be done on online segments till it is > converted to the transactional segment. Likun: Compaction and other data management work will still be done by CarbonSession application in standard spark cluster. > 3. SDK writer should be responsible to add complete carbondata file to > online segment once the writing is done, it should not add any half cooked > data. Likun: yes, in the design doc, I have mentioned this > > And also as we are trying to updating the tablestatus from other modules > like SDK , we better consider the segment interface first. Please go through > the jira > https://issues.apache.org/jira/projects/CARBONDATA/issues/CARBONDATA-2827 > > > Regards, > Ravindra > > > > > -- > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
In reply to this post by Liang Chen
> 在 2018年12月8日,下午3:53,Liang Chen <[hidden email]> 写道: > > Hi > > Good idea, thank you started this discussion. > > Agree with Ravi comments, we need to double-check some limitations after > introducing the feature. > > Flink and Kafka integration can be discussed later. > For using SDK to write new data to the existing carbondata table , some > questions: > 1.How to ensure to create the same index, dictionary... policy as per the > existing table? > 2.Can you please help me to understand this proposal further : what valued > scenarios require this feature? Likun: currently, SDK writes carbondata files in a flat folder and lose all features built on top on segment concept, such as show segment, delete segment, compaction, datamap, MV, data update, delete, streaming, global dictionary, etc. By introducing this feature (support transactional table in SDK), application can use it in a non-spark environment to write new carbondata files and still enjoy transactional table with segment support and all previous features supported. Basically, these new APIs in SDK adds a new way to write data into an existing carbondata table. It is for non-spark environment such as Flink, Kafka-Stream, Cassandra, or any other Java application. > > ------------------------------------------------------------------------------------------------ > After having online segment, one can use this feature to implement > ApacheFlink-CarbonData integration, or Apache > KafkaStream-CarbonDataintegration, or just using SDK to write new data to > existing CarbonData table,the integration level can be the same as current > Spark-CarbonDataintegration. > > Regards > Liang > > > -- > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
In reply to this post by Nicholas
Hi Nicholas,
Yes, this is a feature required for flink-carbon to write to transactional table. You are welcomed to participate in this. I think you can contribute by reviewing the design doc in CARBONDATA-3152 firstly, after we settle down the API we can open sub-tasks for this ticket. Regards, Jacky > 在 2018年12月10日,下午1:55,Nicholas <[hidden email]> 写道: > > Hi Jacky, > Carbon should support transactional table in SDK before > ApacheFlink-Carbondata Integration.After having online segment, I can use > this feature to implement ApacheFlink-CarbonData integration.Therefore, can > I participate in the development of this feature,facilitating the > integration of ApacheFlink-CarbonData integration feature? > > > -- > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hi Jacky,
I have already reviewed the design doc in CARBONDATA-3152.What is the current progress of supporting transactional table in SDK?In my option, creating online segement firstly is necessary for now. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |