Re: [DISCUSSION] Support Database Location Configuration whileCreating Database

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support Database Location Configuration whileCreating Database

Naresh P R
Hi Shahid,

I have few queries regarding backward compatibility

1) I assume if database is already created in older version, the existing
db folder will be
CarbonStoreLocation + "/" + dbLocation + "/"
eg.,  /user/hive/warehouse/carbon.store/*carbondb/*

If any new table created in existing db, then the folder for new table will
be
*CarbonStoreLocation + "/" + dbLocation + "/" + newTableName*
*AND NOT CarbonStoreLocation + "/" + dbLocation.db + "/" + newTableName*
eg.,  /user/hive/warehouse/carbon.store/*carbondb/carbontable*

Please clarify whether my understanding is right.

2) If my above understanding is right, What if new table Name create in
default db is same as one of the existing databases created in older
version, how are we planning to resolve folder conflicts between new table
in default db & existing db folder.

3) As we are supporting specific database location, is there any plan to
support specific table location similar to hive..

eg., CREATE TABLE [IF NOT EXISTS] [db_name.]table_name [LOCATION hdfs_path]
---
Regards,
Naresh P R

On Thu, Oct 12, 2017 at 6:06 PM, Mohammad Shahid Khan <
[hidden email]> wrote:

> Hi Dev,
>
> Please find updated design documents for "*Support Database Location
> Configuration while Creating Database*"
>
> Changes:
> Carbon will follow the same approach as hive is following.
> The table path should be formed from database location or fixed Carbon
> store location and table name as given below.
>
> *There will be three possible scenarios:*
>
>    I.  Table path for the databases defined with location attribute.
>
>           tablePath = databaseLocation +”/” + tableName
>
>   II. Table path for the databases defined without location attribute.
>
>          tablePath = carbon.storeLocation + “/” + database_Name+”.db”
> +”/”  + tableName
>
>   III.  New table path for the default database.
>
>       tablePath = carbon.storeLocation +”/”  + tableName
>
>
> Regards,
>
> Shahid
>
> On Sat, Oct 7, 2017 at 5:26 PM, Sea <[hidden email]> wrote:
>
>> Hi, Shahid:
>>     I think you misundertood my meaning, the databaseLocation you
>> mentioned is like carbon.storeLocation, not databaseLocation in Hive.
>>     Your databaseLocation is like `hive.metastore.warehouse.dir`,
>>      The default behavior in spark(hive):
>>     If we do not specify database location.
>>       databaseLocation = hive.metastore.warehouse.dir/spark.sql.warehouse.dir
>> + "/" + databaseName.db + "/" + tableName
>>       So databaseLocation  is unique.
>>     If we do not specify table location:
>>        databaseLocation + '/' + tableName
>>
>> ------------------ Original ------------------
>> *From: * "mohdshahidkhan1987";<[hidden email]>;
>> *Date: * Fri, Oct 6, 2017 08:40 PM
>> *To: * "dev"<[hidden email]>;
>> *Subject: * Re: [DISCUSSION] Support Database Location Configuration
>> whileCreating Database
>>
>> Hi Sea,
>>
>> 1. create database with location is supported by spark(hive) only, carbon
>> will not have any own implementation for create database. It is mention
>> here
>> just for reference regarding the location attribute.
>> 2. Why carbon want to keep tablePath = 'databaseLocation  “/” +
>> database_Name + "/" + tableName`
>>
>>  There is problem if we keep the tablePath same as hive. For
>> CarbonFileMetaStore, carbon creates
>>  the schema file at  <TablePath>/Metadata/schema
>>
>> If carbon skips adding databaseName, then two table having same name from
>> two different databases pointing to the same database location will cause
>> problem during table creation, load and query.
>>
>> Even in case hive if two tables in different databases with same are
>> created, then we are showing then when either of the table is queried, the
>> content from both the tables are shown.
>>
>> 3. What does `Carbon.update.sync.folder` means?
>>  This is to configure the directory for modifiedTime.mdt.
>>  Earlier the directory path for modifiedTime.mdt was fixed to
>> carbon.storeLocation, but what if user           decides to remove the
>> name
>> service of the  carbon.storeLocation.
>>  This is required for the federation cluster, where multiple name services
>> will be available. So if the     nameservice to which the the directory
>> for
>> modifiedTime.mdt is removed then  the directory could be
>>  changed.
>>
>> Regards,
>> Shahid
>>
>>
>>
>> --
>> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5
>> .nabble.com/
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Support Database Location Configuration whileCreating Database

mohdshahidkhan
Hi Naresh,
Thanks for your design review.
Please find my answers to your query.
have few queries regarding backward compatibility

1) I assume if database is already created in older version, the existing
db folder will be
CarbonStoreLocation + "/" + dbLocation + "/"
eg.,  /user/hive/warehouse/carbon.store/*carbondb/*

If any new table created in existing db, then the folder for new table will
be
*CarbonStoreLocation + "/" + dbLocation + "/" + newTableName*
*AND NOT CarbonStoreLocation + "/" + dbLocation.db + "/" + newTableName*
eg.,  /user/hive/warehouse/carbon.store/*carbondb/carbontable*

Please clarify whether my understanding is right.

A. For old database created in older version and for the databases created
without specifying the database location, we will refer the old folder
structure model.
Ie store path + / + database name +

This feature is for the databases crested with location attributes.
Create database dbname location 'database path'
So for this we will follow our new folder structure model.

2) If my above understanding is right, What if new table Name create in
default db is same as one of the existing databases created in older
version, how are we planning to resolve folder conflicts between new table
in default db & existing db folder.
A. I think tableexist check during table crestiob will not allow to create
the table. This will check and confirm.

3) As we are supporting specific database location, is there any plan to
support specific table location similar to hive..

A. As of now no such plan.

Regards,
Shahid


On 29 Oct 2017 14:47, "Naresh P R" <[hidden email]> wrote:

Hi Shahid,

I have few queries regarding backward compatibility

1) I assume if database is already created in older version, the existing
db folder will be
CarbonStoreLocation + "/" + dbLocation + "/"
eg.,  /user/hive/warehouse/carbon.store/*carbondb/*

If any new table created in existing db, then the folder for new table will
be
*CarbonStoreLocation + "/" + dbLocation + "/" + newTableName*
*AND NOT CarbonStoreLocation + "/" + dbLocation.db + "/" + newTableName*
eg.,  /user/hive/warehouse/carbon.store/*carbondb/carbontable*

Please clarify whether my understanding is right.

2) If my above understanding is right, What if new table Name create in
default db is same as one of the existing databases created in older
version, how are we planning to resolve folder conflicts between new table
in default db & existing db folder.

3) As we are supporting specific database location, is there any plan to
support specific table location similar to hive..

eg., CREATE TABLE [IF NOT EXISTS] [db_name.]table_name [LOCATION hdfs_path]
---
Regards,
Naresh P R

On Thu, Oct 12, 2017 at 6:06 PM, Mohammad Shahid Khan <
[hidden email]> wrote:

> Hi Dev,
>
> Please find updated design documents for "*Support Database Location
> Configuration while Creating Database*"
>
> Changes:
> Carbon will follow the same approach as hive is following.
> The table path should be formed from database location or fixed Carbon
> store location and table name as given below.
>
> *There will be three possible scenarios:*
>
>    I.  Table path for the databases defined with location attribute.
>
>           tablePath = databaseLocation +”/” + tableName
>
>   II. Table path for the databases defined without location attribute.
>
>          tablePath = carbon.storeLocation + “/” + database_Name+”.db”
> +”/”  + tableName
>
>   III.  New table path for the default database.
>
>       tablePath = carbon.storeLocation +”/”  + tableName
>
>
> Regards,
>
> Shahid
>
> On Sat, Oct 7, 2017 at 5:26 PM, Sea <[hidden email]> wrote:
>
>> Hi, Shahid:
>>     I think you misundertood my meaning, the databaseLocation you
>> mentioned is like carbon.storeLocation, not databaseLocation in Hive.
>>     Your databaseLocation is like `hive.metastore.warehouse.dir`,
>>      The default behavior in spark(hive):
>>     If we do not specify database location.
>>       databaseLocation = hive.metastore.warehouse.dir/
spark.sql.warehouse.dir

>> + "/" + databaseName.db + "/" + tableName
>>       So databaseLocation  is unique.
>>     If we do not specify table location:
>>        databaseLocation + '/' + tableName
>>
>> ------------------ Original ------------------
>> *From: * "mohdshahidkhan1987";<[hidden email]>;
>> *Date: * Fri, Oct 6, 2017 08:40 PM
>> *To: * "dev"<[hidden email]>;
>> *Subject: * Re: [DISCUSSION] Support Database Location Configuration
>> whileCreating Database
>>
>> Hi Sea,
>>
>> 1. create database with location is supported by spark(hive) only, carbon
>> will not have any own implementation for create database. It is mention
>> here
>> just for reference regarding the location attribute.
>> 2. Why carbon want to keep tablePath = 'databaseLocation  “/” +
>> database_Name + "/" + tableName`
>>
>>  There is problem if we keep the tablePath same as hive. For
>> CarbonFileMetaStore, carbon creates
>>  the schema file at  <TablePath>/Metadata/schema
>>
>> If carbon skips adding databaseName, then two table having same name from
>> two different databases pointing to the same database location will cause
>> problem during table creation, load and query.
>>
>> Even in case hive if two tables in different databases with same are
>> created, then we are showing then when either of the table is queried,
the
>> content from both the tables are shown.
>>
>> 3. What does `Carbon.update.sync.folder` means?
>>  This is to configure the directory for modifiedTime.mdt.
>>  Earlier the directory path for modifiedTime.mdt was fixed to
>> carbon.storeLocation, but what if user           decides to remove the
>> name
>> service of the  carbon.storeLocation.
>>  This is required for the federation cluster, where multiple name
services

>> will be available. So if the     nameservice to which the the directory
>> for
>> modifiedTime.mdt is removed then  the directory could be
>>  changed.
>>
>> Regards,
>> Shahid
>>
>>
>>
>> --
>> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5
>> .nabble.com/
>>
>
>