http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Taking-the-inputs-for-Segment-Interface-Refactoring-tp101950p105292.html
transaction manager as well.
a. Across table transaction --> expose start transaction, commit
transaction, rollback transaction to the user/application. Commit table
successful.
b. Table level versioning/MVCC for time travel, internally get the
keep one transaction file.
work will complicate things to design and handle in one PR. So, I want to
> Hi Everyone.
> Please find the design of refactored segment interfaces in the document
> attached. Also can check the same V3 version attached in the JIRA [
>
https://issues.apache.org/jira/browse/CARBONDATA-2827]
>
> It is based on some recent discussions and the previous discussions of
> 2018
> [
>
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Refactor-Segment-Management-Interface-td58926.html> ]
>
> *Note:*
> 1) As the pre-aggreage feature is not present and MV ,SI supports
> incremental loading. so, now the previous problem of commit all child
> table status at once maybe not applicable. so, removed interfaces for that.
> 2) All these will be developed in a new module called *carbondata-acid*
> and other required module depends on it.
> 3) Once this is implemented. we can discuss the design of time travel on
> top of it. [Transaction manager implementation and writing multiple table
> status files with versioning]
>
> Please go through it and give your inputs.
>
> Thanks,
> Ajantha
>
> On Mon, Oct 19, 2020 at 9:43 AM David CaiQiang <
[hidden email]>
> wrote:
>
>> I list feature list about segment as follows before starting to re-factory
>> segment interface.
>>
>> [table related]
>> 1. get lock for table
>> lock for tablestatus
>> lock for updatedTablestatus
>> 2. get lastModifiedTime of table
>>
>> [segment related]
>> 1. segment datasource
>> datasource: file format,other datasource
>> fileformat: carbon,parquet,orc,csv..
>> catalog type: segment, external segment
>> 2. data load etl(load/insert/add_external_segment/insert_stage)
>> write segment for batch loading
>> add external segment by using external folder path for mixed file
>> formatted table
>> append streaming segment for spark structed streaming
>> insert_stage for flink writer
>> 3. data query
>> segment properties and schema
>> segment level index cache and pruning
>> cache/refresh block/blocklet index cache if needed by segment
>> read segments to a dataframe/rdd
>> 4. segment management
>> new segment id for loading/insert/add_external_segment/insert_stage
>> create global segment identifier
>> show[history]/delete segment
>> 5. stats
>> collect dataSize and indexSize of the segment
>> lastModifiedTime, start/end time, update start/end time
>> fileFormat
>> status
>> 6. segment level lock for supporting concurrent operations
>> 7. get tablestatus storage factory
>> storage solution 1): use file system by default
>> storage solution 2): use hive metastore or db
>>
>> [table status related]:
>> 1. record new LoadMetadataDetails
>> loading/insert/compatcion start/end
>> add external segment start/end
>> insert stage
>>
>> 2. update LoadMetadataDetails
>> compation
>> update/delete
>> drop partition
>> delete segment
>>
>> 3. read LoadMetadataDetails
>> list all/valid/invalid segment
>>
>> 4. backup and history
>>
>> [segment file related]
>> 1. write new segment file
>> generate segment file name
>> better to use new timestamp to generate new segment file name for
>> each
>> writing. avoid overwriting segment file with same name.
>> write semgent file
>> merge temp segment file
>> 2. read segment file
>> readIndexFiles
>> readIndexMergeFiles
>> getPartitionSpec
>> 3. update segment file
>> update
>> merge index
>> drop partition
>>
>> [clean files related]
>> 1. clean stale files for the successful segment operation
>> data deletion should delay a period of time(maybe query timeout
>> interval), avoid deleting file immediately(beside of drop table/partition,
>> force clean files)
>> include data file, index file, segment file, tablestatus file
>> impact operation: mergeIndex
>> 2. clean stale files for failed segment operation immediately
>>
>>
>>
>>
>>
>> -----
>> Best Regards
>> David Cai
>> --
>> Sent from:
>>
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/>>
>