Apache CarbonData Dev Mailing List archive

[DISCUSSION] Multi-tenant support by refactoring datamaps

Classic

List

Threaded

4 messages Options

Indhumathi

Feb 13, 2020; 6:58am

[DISCUSSION] Multi-tenant support by refactoring datamaps

This post was updated on Feb 13, 2020; 7:00am.

Hello all,

Currently, when user creates a datamap, system will store the datamap
metadata in a configurable system folder in HDFS or S3. And also, since we
use same naming conventionsas datamap name for datamapschema,
users cannot create datamap with samename which is already present
in storage.

System folder currently holds the following files,
1. DataMapSchema -> a json file containing schema for one datamap.
2. DataMapStatus -> status for each datamap

In cloud scenarios, when one user creates SYSTEM_FOLDER and stores metadata
for materalized views and index datamap's such as bloom and lucene, other
user's are not able to access the SYSTEM_FOLDER.

In order to support multi-tenancy for datamaps, i am planning to move
system_folder under each database level, so that users can access it.
As system folder is moved across database folder,users can create datamap
with same name under different databases.

Datamaps will be saved to database folder specified while creating datamap.

Any suggestions/inputs from the community is appreciated.

Thanks
Indhumathi

David CaiQiang

Feb 13, 2020; 7:09am

Re: [DISCUSSION] Multi-tenant support by refactoring datamaps

+1

please take care of the performance changes during refactoring datamaps

-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Best Regards
David Cai

Jacky Li

Feb 15, 2020; 1:47pm

回复：[DISCUSSION] Multi-tenant support by refactoring datamaps

In reply to this post by Indhumathi

Hi,

+1 for moving the DataMapSchema json file to database folder, for supporting multi-tenancy.

Furthermore, I suggest we further refactor the datamap. The reason is that now the Sencodary Index feature have been introduced into CarbonData, and it stores the index metadata as the table property in the main table, and Index datamap actually also only associated with one main table only, so we can do the same for index datamap.

Propose to refactor as following:
1. For index datamap like bloom filter and lucene datamap, move their metadata (DataMapSchema) to the table property of the main table. Just like the way SI has done. 

2. Then DataMapSchema is only for Materialized View. We can rename it to MVSchema and clean up to keep only required fields for MV only.

3. Add separate commands for CREATE MATERIALIZED VIEW and CREATE INDEX, unify the Index SQL syntax for bloomfilter, lucene and SI.

4. After these refactory, for MV we can enlarge its support scope for non-carbon table. This could be a big benefit for user as he can accelerate OLAP queries on orc/parquet tables, for example.

Regards,
Jacky

------------------ 原始邮件 ------------------
发件人: "Indhumathi M"<[hidden email]>;
发送时间: 2020年2月13日(星期四) 下午3:28
收件人: "dev"<[hidden email]>;

主题: [DISCUSSION] Multi-tenant support by refactoring datamaps

Hello all,

Currently, when user creates a datamap, system will store the datamap
metadata in a configurable system folder in HDFS or S3. And also, since we
use same naming conventions
as datamap name for datamapschema, users cannot create datamap with same
name which is already present in storage.

System folder currently holds the following files,
1. DataMapSchema -> a json file containing schema for one datamap.
2. DataMapStatus -> status for each datamap

In cloud scenarios, when one user creates SYSTEM_FOLDER and stores metadata
for materalized views and index datamap's such as bloom and lucene, other
user's are not able to access the SYSTEM_FOLDER.

In order to support multi-tenancy for datamaps, i am planning to move
system_folder under
each database level, so that users can access it. As system folder is moved
across database folder,users can create datamap with same name under
different databases.

Datamaps will be saved to database folder specified while creating datamap.

Any suggestions/inputs from the community is appreciated.

Thanks
Indhumathi

akashrn5

Feb 16, 2020; 8:21am

Re: 回复：[DISCUSSION] Multi-tenant support by refactoring datamaps

Hi,

+1

I agree with jacky, we can store Info in table metadata. But here one
problem we can face, that is metastore connection issue. If there are lot of
tables and datamaps, doing many connection to metastore reduces performance.
In this case reading from one schema file will be better.

So if we are planning to store in metadata, then better while refactoring we
need to take care of almost reducing the metastore connection for getting
info of datamaps until and unless table is altered or any other similar
scenario.

Regards
Akash

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/