niuge01 commented on a change in pull request #3720: URL: https://github.com/apache/carbondata/pull/3720#discussion_r411121878 ########## File path: docs/mv-guide.md ########## @@ -0,0 +1,342 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to you under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +# CarbonData Materialized View + +* [Quick Example](#quick-example) +* [Introduction](#introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Compaction](#compacting) +* [Data Management](#data-management) +* [Time Series Support](#time-series-support) +* [Time Series RollUp Support](#time-series-rollup-support) + +## Quick example + + Start spark-sql in terminal and run the following queries, + + ``` + CREATE TABLE maintable(a int, b string, c int) stored as carbondata; + INSERT INTO maintable SELECT 1, 'ab', 2; + CREATE MATERIALIZED VIEW view1 AS SELECT a, sum(b) FROM maintable GROUP BY a; + SELECT a, sum(b) FROM maintable GROUP BY a; + // NOTE: run explain query and check if query hits the Index table from the plan + EXPLAIN SELECT a, sum(b) FROM maintable GROUP BY a; + ``` + +## Introduction + + Materialized views are created as sub-queries. User can create limitless materialized view to + improve query performance provided the storage requirements and loading time is acceptable. + + Materialized view can refreshed on commit or on manual. Once materialized views are created, + CarbonData's CarbonAnalyzer helps to select the most efficient materialized view based on + the user query and rewrite the SQL to select the data from materialized view instead of + related tables. Since the data size of materialized view is smaller and data is pre-processed, + user queries are much faster. + + For instance, related table called **sales** which is defined as. + + ``` + CREATE TABLE sales ( + order_time timestamp, + user_id string, + sex string, + country string, + quantity int, + price bigint) + STORED AS carbondata + ``` + + User can create materialized view using the CREATE MATERIALIZED VIEW statement. + + ``` + CREATE MATERIALIZED VIEW agg_sales + PROPERTIES('TABLE_BLOCKSIZE'='256 MB','LOCAL_DICTIONARY_ENABLE'='false') + AS + SELECT country, sex, sum(quantity), avg(price) + FROM sales + GROUP BY country, sex + ``` + + **NOTE**: + * Group by and Order by columns has to be provided in projection list while creating materialized view. + * If only single related table is involved in materialized view creation, then TableProperties of + related table (if not present in a aggregate function like sum(col)) listed below will be + inherited to materialized view. + 1. SORT_COLUMNS + 2. SORT_SCOPE + 3. TABLE_BLOCKSIZE + 4. FLAT_FOLDER + 5. LONG_STRING_COLUMNS + 6. LOCAL_DICTIONARY_ENABLE + 7. LOCAL_DICTIONARY_THRESHOLD + 8. LOCAL_DICTIONARY_EXCLUDE + 9. INVERTED_INDEX + 10. NO_INVERTED_INDEX + 11. COLUMN_COMPRESSOR + * Creating materialized view with select query containing only project of all columns of related + table is unsupported. + **Example:** + If table 'x' contains columns 'a,b,c', then creating MV Index with below queries is not supported. + 1. ```SELECT a,b,c FROM x``` + 2. ```SELECT * FROM x``` + * TableProperties can be provided in Properties excluding LOCAL_DICTIONARY_INCLUDE, + LOCAL_DICTIONARY_EXCLUDE, INVERTED_INDEX, NO_INVERTED_INDEX, SORT_COLUMNS, LONG_STRING_COLUMNS, + RANGE_COLUMN & COLUMN_META_CACHE. + * TableProperty given in Properties will be considered for materialized view creation, even though + if same property is inherited from related table, which allows user to provide different table + properties for materialized view. + * Materialized view creation with limit or union all CTAS queries is unsupported. + * Materialized view does not support streaming. + +#### How materialized views are selected + + When a user query is submitted, during query planning phase, CarbonData will collect modular plan + candidates and process the the ModularPlan based on registered summary data sets. Then, + materialized view for this query will be selected among the candidates. + + For the related table **sales** and materialized view **agg_sales** created above, following queries + ``` + SELECT country, sex, sum(quantity), avg(price) FROM sales GROUP BY country, sex + SELECT sex, sum(quantity) FROM sales GROUP BY sex + SELECT avg(price), country FROM sales GROUP BY country + ``` + + will be transformed by CarbonData's query planner to query against materialized view**agg_sales** Review comment: OK ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |