Apache CarbonData Dev Mailing List archive - Re: DDL for CarbonData table backup and recovery (new feature)

Apache CarbonData Dev Mailing List archive

Re: DDL for CarbonData table backup and recovery (new feature)

Posted by Naresh P R on Nov 23, 2017; 2:07pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DDL-for-CarbonData-table-backup-and-recovery-new-feature-tp27854p27868.html

Hi Shahid,

Can the new DDL be similar to Import / Export Syntax
eg.,
EXPORT TABLE tablename TO 'export_target_path' -- Export actual table &
associated agg tables as zip file

IMPORT [TABLE tablename] FROM 'source_path' -- Import data from zip
file to "carbon store path" & register the table as mentioned in your
mail, tablename can be optional in this case.

==> If tablename is not mentioned or mentioned table does not exist,
we can assume table does not exist & need to create it

==> If tablename is mentioned & table exist, then we can assume it as
incremental data update or schema evolution.

==> We can validate existing files checksum against new files &
overwrite/remove stale files

==> If schema update happened, then we can update the schema into
the metastore same way as we are doing for add/drop column commands.

I think all newer carbondata versions are backward compatible, any
restrictions or thoughts on cross version import export ?
---
Regards,
Naresh P R

On Thu, Nov 23, 2017 at 4:47 PM, Mohammad Shahid Khan <
[hidden email]> wrote:

> Hi Dev,
>
> *Please find initial solution.*
>
>
> *CarbonData table backup and recovery*
>
> *Background*
>
> Customer has created one CarbonData table which is already loaded very huge
> data, and now they install another cluster which want to use the same data
> as this table and don’t want load again, because load data cost long time,
> so they want can directly backup this table data and recover it in another
> cluster. After recovery the data in the CarbonData user can use it as a
> normal CarbonData table.
>
> *Requirement Description*
>
> A CarbonData table’s data can support backup the data and recover the data
> which no need load data again.
>
> To reuse the CarbonData table of another cluster a DDL should be provided
> to create the CarbonData table from the existing carbon table schema.
>
> *Solution*
>
> Currently CarbonData has below three types of tables
>
> 1. Normal table
>
> 2. Pre Aggregate table
>
> CarbonData should provide a DDL command to create the table from existing
> table data.
> Below DDL command could be used to create the table from existing table
> data.
>
> * REGISTER TABLES FROM $dbPath*
>
>
>
> i. The database path will be scanned to get all table schemas.
>
> ii. The schema will be read to get the database name, table name
> and columns details.
>
> iii. The *table will be registered to the hive catalog with
> below details*
>
> *CREATE TABLE $tbName USING carbondata OPTIONS (tableName
> "$dbName.$tbName",*
>
> *dbName "$dbName",*
>
> *tablePath "$tablePath",*
>
> *path "$tablePath"** )*
>
>
> *Precondition**:*
>
> i. Before executing this command the old table schema and data
> should be copied into the new store location.
>
> ii. If the table is aggregate table then all the aggregate tables
> should be copied to the new store location.
>
>
>
> *Validation:*
>
>
> 1. If database does not exist then the registration will fail.
> 2. The table will be registered only if same table name is not already
> registered.
> 3. If the table contains the aggregate tables then all the aggregate
> tables should be registered to hive catalog and if any the aggregate
> table does not exist then the table creation operation should fail.
>
> Regards,
>
> Shahid
>