Apache CarbonData Dev Mailing List archive

Re: Questions about dictionary-encoded column and MDK

Posted by ZhuWilliam on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Questions-about-dictionary-encoded-column-and-MDK-tp9457p9618.html

1. Dictionary encoding make column storage more efficient with small size
and improved search performance。
2. when search,MDK/Min-Max can be used to do block/blocklet prunning in
oder to reduce IO. For now ,MDK is composed by dimensions with the oder of
declared in create table statement

On Thu, Mar 23, 2017 at 11:51 PM, Liang Chen <[hidden email]>
wrote:

> Hi
>
> 1.System makes MDK index for dimensions(string columns as dimensions,
> numeric
> columns as measures) , so you have to specify at least one dimension(string
> column) for building MDK index.
>
> 2.You can set numeric column with DICTIONARY_INCLUDE or DICTIONARY_EXCLUDE
> to
> build MDK index.
> For case2, you can change script like :
> carbon.sql("create table if not exists test(a integer, b integer, c
> integer)
> STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_INCLUDE'='a')");
>
> Regards
> Liang
>
> 2017-03-23 18:39 GMT+05:30 Jin Zhou <[hidden email]>:
>
> > Exception info:
> > scala> carbon.sql("create table if not exists test(a integer, b integer,
> c
> > integer) STORED BY 'carbondata'");
> > org.apache.carbondata.spark.exception.MalformedCarbonCommandException:
> > Table
> > default.test can not be created without key columns. Please use
> > DICTIONARY_INCLUDE or DICTIONARY_EXCLUDE to set at least one key column
> if
> > all specified columns are numeric types
> > at
> > org.apache.spark.sql.catalyst.CarbonDDLSqlParser.prepareTableModel(
> > CarbonDDLSqlParser.scala:240)
> > at
> > org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable(
> > CarbonSparkSqlParser.scala:162)
> > at
> > org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable(
> > CarbonSparkSqlParser.scala:60)
> > at
> > org.apache.spark.sql.catalyst.parser.SqlBaseParser$
> > CreateTableContext.accept(SqlBaseParser.java:503)
> > at
> > org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(
> > AbstractParseTreeVisitor.java:42)
> > at
> > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$
> > visitSingleStatement$1.apply(AstBuilder.scala:66)
> > at
> > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$
> > visitSingleStatement$1.apply(AstBuilder.scala:66)
> > at
> > org.apache.spark.sql.catalyst.parser.ParserUtils$.
> > withOrigin(ParserUtils.scala:93)
> > at
> > org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(
> > AstBuilder.scala:65)
> > at
> > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$
> > anonfun$parsePlan$1.apply(ParseDriver.scala:54)
> > at
> > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$
> > anonfun$parsePlan$1.apply(ParseDriver.scala:53)
> > at
> > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.
> > parse(ParseDriver.scala:82)
> > at
> > org.apache.spark.sql.parser.CarbonSparkSqlParser.parse(
> > CarbonSparkSqlParser.scala:56)
> > at
> > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.
> > parsePlan(ParseDriver.scala:53)
> > at
> > org.apache.spark.sql.parser.CarbonSparkSqlParser.parsePlan(
> > CarbonSparkSqlParser.scala:46)
> > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
> > ... 50 elided
> >
> > I didn't notice “if all specified columns are numeric types” in exception
> > info. So I did more tests and found the issue only occurs when all
> columns
> > are numeric types.
> >
> > Below are cases I tested:
> > case 1：
> > carbon.sql("create table if not exists test(a string, b string, c string)
> > STORED BY 'carbondata' 'DICTIONARY_EXCLUDE'='a,b,c' ");
> > ====> ok, no dictionary column
> >
> > case 2：
> > carbon.sql("create table if not exists test(a integer, b integer, c
> > integer)
> > STORED BY 'carbondata'");
> > ====> fail
> >
> > case 3:
> > carbon.sql("create tale if not exists test(a integer, b integer, c
> integer)
> > STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_INCLUDE'='a')");
> > ====> ok, at least one dictionary column
> >
> > One little problem about case 2 is that there are no proper dictionary
> > column when all columns have high cardinality.
> >
> >
> >
> >
> > --
> > View this message in context: http://apache-carbondata-
> > mailing-list-archive.1130556.n5.nabble.com/Questions-about-
> > dictionary-encoded-column-and-MDK-tp9457p9484.html
> > Sent from the Apache CarbonData Mailing List archive mailing list archive
> > at Nabble.com.
> >
>
>
>
> --
> Regards
> Liang
>

--
Best Regards
_______________________________________________________________
开阔视野专注开发
WilliamZhu 祝海林 [hidden email]
产品事业部-基础平台-搜索&数据挖掘
手机：18601315052
MSN：[hidden email]
微博：@PrinceCharmingJ http://weibo.com/PrinceCharmingJ
地址：北京市朝阳区广顺北大街33号院1号楼福码大厦B座12层
_______________________________________________________________
http://www.csdn.net You're the One
全球最大中文IT技术社区一切由你开始

http://www.iteye.net
程序员深度交流社区