GitHub user sraghunandan opened a pull request:
https://github.com/apache/carbondata/pull/1886 [CARBONDATA-2098]Add Documentation for Pre-Aggregate tables Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? No - [x] Any backward compatibility impacted? No - [x] Document update required? Updated documentation - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. NA - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/sraghunandan/carbondata-1 partition_preagg_documentation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1886.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1886 ---- commit 49ead42c07aae85afbaac74ba5fbbd256a98fd72 Author: Raghunandan S <carbondatacontributions@...> Date: 2018-01-29T03:24:49Z Add Documentation for Pre-Aggregate tables ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1886 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2036/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1886 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3273/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1886 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3227/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1886#discussion_r165249189 --- Diff: docs/data-management-on-carbondata.md --- @@ -703,6 +704,194 @@ This tutorial is going to introduce all commands and data operations on CarbonDa * The partitioned column can be excluded from SORT_COLUMNS, this will let other columns to do the efficient sorting. * When writing SQL on a partition table, try to use filters on the partition column. +## PRE-AGGREGATE TABLES --- End diff -- Please add some example to show the plan matching mechanism, like what query will hit which datamap --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/1886 @sraghunandan please add pre-agg example, it is better to have the performance comparison inside example. --- |
In reply to this post by qiuchenjian-2
Github user sraghunandan commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1886#discussion_r165290684 --- Diff: docs/data-management-on-carbondata.md --- @@ -703,6 +704,194 @@ This tutorial is going to introduce all commands and data operations on CarbonDa * The partitioned column can be excluded from SORT_COLUMNS, this will let other columns to do the efficient sorting. * When writing SQL on a partition table, try to use filters on the partition column. +## PRE-AGGREGATE TABLES --- End diff -- added example --- |
In reply to this post by qiuchenjian-2
Github user sraghunandan commented on the issue:
https://github.com/apache/carbondata/pull/1886 @jackylk @chenliang613 added examples. pls review --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1886 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2167/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1886 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3406/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1886 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2171/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1886 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3283/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1886 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2201/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1886 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3441/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1886 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3305/ --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1886#discussion_r165641742 --- Diff: docs/data-management-on-carbondata.md --- @@ -748,6 +749,250 @@ This tutorial is going to introduce all commands and data operations on CarbonDa * The partitioned column can be excluded from SORT_COLUMNS, this will let other columns to do the efficient sorting. * When writing SQL on a partition table, try to use filters on the partition column. +## PRE-AGGREGATE TABLES + Carbondata supports pre aggregating of data so that OLAP kind of queries can fetch data + much faster.Aggregate tables are created as datamaps so that the handling is as efficient as + other indexing support.Users can create as many aggregate tables they require as datamaps to + improve their query performance,provided the storage requirements and loading speeds are + acceptable. + + For main table called **sales** which is defined as + + ``` + CREATE TABLE sales ( + order_time timestamp, + user_id string, + sex string, + country string, + quantity int, + price bigint) + STORED BY 'carbondata' + ``` + + user can create pre-aggregate tables using the DDL + + ``` + CREATE DATAMAP agg_sales + ON TABLE sales + USING "preaggregate" + AS + SELECT country, sex, sum(quantity), avg(price) + FROM sales + GROUP BY country, sex + ``` + +<b><p align="left">Functions supported in pre-aggregate tables</p></b> + +| Function | Rollup supported | +|-----------|----------------| +| SUM | Yes | +| AVG | Yes | +| MAX | Yes | +| MIN | Yes | +| COUNT | Yes | +| DISTINCT COUNT | No | --- End diff -- Currently we are not supporting distinct count and in future only for timeseries we can support --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1886 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3471/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1886 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2231/ --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/1886 LGTM --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1886#discussion_r165807899 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/TimeSeriesPreAggregateTableExample.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.spark.sql.SaveMode + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties + +/** + * This example is for time series pre-aggregate tables. + */ + +object TimeSeriesPreAggregateTableExample { + + def main(args: Array[String]) { + + val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath + val testData = s"$rootPath/integration/spark-common-test/src/test/resources/timeseriestest.csv" + val spark = ExampleUtils.createCarbonSession("TimeSeriesPreAggregateTableExample") + + spark.sparkContext.setLogLevel("ERROR") + + import spark.implicits._ + + import scala.util.Random + val r = new Random() + val df = spark.sparkContext.parallelize(1 to 10 * 1000 * 1000 ) --- End diff -- please reduce the data size --- |
Free forum by Nabble | Edit this page |