This post was updated on .
Hello all,
Currently, when user creates a datamap, system will store the datamap metadata in a configurable system folder in HDFS or S3. And also, since we use same naming conventionsas datamap name for datamapschema, users cannot create datamap with samename which is already present in storage. System folder currently holds the following files, 1. DataMapSchema -> a json file containing schema for one datamap. 2. DataMapStatus -> status for each datamap In cloud scenarios, when one user creates SYSTEM_FOLDER and stores metadata for materalized views and index datamap's such as bloom and lucene, other user's are not able to access the SYSTEM_FOLDER. In order to support multi-tenancy for datamaps, i am planning to move system_folder under each database level, so that users can access it. As system folder is moved across database folder,users can create datamap with same name under different databases. Datamaps will be saved to database folder specified while creating datamap. Any suggestions/inputs from the community is appreciated. Thanks Indhumathi |
+1
please take care of the performance changes during refactoring datamaps ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
In reply to this post by Indhumathi
Hi,
+1 for moving the DataMapSchema json file to database folder, for supporting multi-tenancy. Furthermore, I suggest we further refactor the datamap. The reason is that now the Sencodary Index feature have been introduced into CarbonData, and it stores the index metadata as the table property in the main table, and Index datamap actually also only associated with one main table only, so we can do the same for index datamap. Propose to refactor as following: 1. For index datamap like bloom filter and lucene datamap, move their metadata (DataMapSchema) to the table property of the main table. Just like the way SI has done. 2. Then DataMapSchema is only for Materialized View. We can rename it to MVSchema and clean up to keep only required fields for MV only. 3. Add separate commands for CREATE MATERIALIZED VIEW and CREATE INDEX, unify the Index SQL syntax for bloomfilter, lucene and SI. 4. After these refactory, for MV we can enlarge its support scope for non-carbon table. This could be a big benefit for user as he can accelerate OLAP queries on orc/parquet tables, for example. Regards, Jacky ------------------ 原始邮件 ------------------ 发件人: "Indhumathi M"<[hidden email]>; 发送时间: 2020年2月13日(星期四) 下午3:28 收件人: "dev"<[hidden email]>; 主题: [DISCUSSION] Multi-tenant support by refactoring datamaps Hello all, Currently, when user creates a datamap, system will store the datamap metadata in a configurable system folder in HDFS or S3. And also, since we use same naming conventions as datamap name for datamapschema, users cannot create datamap with same name which is already present in storage. System folder currently holds the following files, 1. DataMapSchema -> a json file containing schema for one datamap. 2. DataMapStatus -> status for each datamap In cloud scenarios, when one user creates SYSTEM_FOLDER and stores metadata for materalized views and index datamap's such as bloom and lucene, other user's are not able to access the SYSTEM_FOLDER. In order to support multi-tenancy for datamaps, i am planning to move system_folder under each database level, so that users can access it. As system folder is moved across database folder,users can create datamap with same name under different databases. Datamaps will be saved to database folder specified while creating datamap. Any suggestions/inputs from the community is appreciated. Thanks Indhumathi |
Hi,
+1 I agree with jacky, we can store Info in table metadata. But here one problem we can face, that is metastore connection issue. If there are lot of tables and datamaps, doing many connection to metastore reduces performance. In this case reading from one schema file will be better. So if we are planning to store in metadata, then better while refactoring we need to take care of almost reducing the metastore connection for getting info of datamaps until and unless table is altered or any other similar scenario. Regards Akash -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |