Apache CarbonData Dev Mailing List archive

S3 support

Posted by kunalkapoor on Jun 21, 2018; 6:28am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/S3-support-tp52541.html

Hi dev,
The current implementation of S3 support has a few limitations which are
listed below.

*Problem(Locking):*
Currently while writing a file onto HDFS a lock is acquired to ensure
synchronisation which is not feasible in case of S3 as it does not have any
lease(only one user can write at a time).
*Solution:*
Introduce Memory lock which can take care of the above problem.

*Problem(Write with append mode):*
Everytime a thrift related file stream is opened append mode is used which
is not supported in S3. Therefore while writing index files in append mode
the existing file is read into memory and then rewritten with overwrite as
true and then new content is written to the file.
*Solution:*

*Change the current implementation of ThriftWriter(for s3) to collect the
contents of index file in a buffer add the new content and overwrite the
whole file at once.*

*Problem(Alter rename):*
In case of rename currently table path is also being updated with the new
table name. But S3 does not support force rename and the rename is copying
files onto new path which can be a very time consuming task therefore the
current implementation can be changed as follows:

- Rename the table in metadata without altering the table path(table
path will not be updated with the new table name).
- If user tries to create table with old table name then create the path
with UUID appended to the table name.

For example table name is table1 and table path is store/db1/table1. When
renaming to table2 the table name in metadata will be update to table2 but
the path will remain the same. If user tries to create a new table with
table1 name then the table path would be table1-<UUID>.

*Problem(Preaggregate transaction):*
Pre-aggregate transaction support is relying heavily on renaming the table
status file as follows:

- Write the main table segment as In-progress in tablestatus file.
- Write the aggregate table segment as In-progress in tablestatus file.
- When load for aggregate table completes write the Success segment into
a new table status file with the name tablestatus-UUID.
- When the load for all aggregate tables are complete start renaming the
tablestatus file to tablestatus_backup_UUID and rename tablestatus-UUID to
tablestatus. remove all files with _backup_UUID once done. If everything is
Success then change the segment status to Success for main table. If
anything fails then use the _backup_UUID to restore the aggregate table to
restore to old state.

*Proposed Solution:*
If we use DB to store table status of the aggregate table on S3 then this
problem will not come as the DB can ensure transactional behaviour while
updation.

Any suggestion from community is most welcomed. Please let me know for any
clarification.

Regards
Kunal Kapoor