Hi,
Recently I'm doing some tests on spark2.1.0+carbondata1.0.0 and have some questions: 1)Exception is thrown when table created without any dictionary column. Does that means carbon table must have at least one dictionary column? 2)What's the connection between dictionary-encoded column and MDK? Does MDK only contains dictionary-encoded column? |
Administrator
|
Hi
Can you provide your full exception info. Regards Liang 2017-03-23 13:54 GMT+05:30 Jin Zhou <[hidden email]>: > Hi, > > Recently I'm doing some tests on spark2.1.0+carbondata1.0.0 and have some > questions: > > 1)Exception is thrown when table created without any dictionary column. > Does > that means carbon table must have at least one dictionary column? > > 2)What's the connection between dictionary-encoded column and MDK? Does MDK > only contains dictionary-encoded column? > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/Questions-about- > dictionary-encoded-column-and-MDK-tp9457.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. > -- Regards Liang |
Exception info:
scala> carbon.sql("create table if not exists test(a integer, b integer, c integer) STORED BY 'carbondata'"); org.apache.carbondata.spark.exception.MalformedCarbonCommandException: Table default.test can not be created without key columns. Please use DICTIONARY_INCLUDE or DICTIONARY_EXCLUDE to set at least one key column if all specified columns are numeric types at org.apache.spark.sql.catalyst.CarbonDDLSqlParser.prepareTableModel(CarbonDDLSqlParser.scala:240) at org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable(CarbonSparkSqlParser.scala:162) at org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable(CarbonSparkSqlParser.scala:60) at org.apache.spark.sql.catalyst.parser.SqlBaseParser$CreateTableContext.accept(SqlBaseParser.java:503) at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42) at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:66) at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:66) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:93) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:65) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:54) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:53) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:82) at org.apache.spark.sql.parser.CarbonSparkSqlParser.parse(CarbonSparkSqlParser.scala:56) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) at org.apache.spark.sql.parser.CarbonSparkSqlParser.parsePlan(CarbonSparkSqlParser.scala:46) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) ... 50 elided I didn't notice “if all specified columns are numeric types” in exception info. So I did more tests and found the issue only occurs when all columns are numeric types. Below are cases I tested: case 1: carbon.sql("create table if not exists test(a string, b string, c string) STORED BY 'carbondata' 'DICTIONARY_EXCLUDE'='a,b,c' "); ====> ok, no dictionary column case 2: carbon.sql("create table if not exists test(a integer, b integer, c integer) STORED BY 'carbondata'"); ====> fail case 3: carbon.sql("create tale if not exists test(a integer, b integer, c integer) STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_INCLUDE'='a')"); ====> ok, at least one dictionary column One little problem about case 2 is that there are no proper dictionary column when all columns have high cardinality. |
Administrator
|
Hi
1.System makes MDK index for dimensions(string columns as dimensions, numeric columns as measures) , so you have to specify at least one dimension(string column) for building MDK index. 2.You can set numeric column with DICTIONARY_INCLUDE or DICTIONARY_EXCLUDE to build MDK index. For case2, you can change script like : carbon.sql("create table if not exists test(a integer, b integer, c integer) STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_INCLUDE'='a')"); Regards Liang 2017-03-23 18:39 GMT+05:30 Jin Zhou <[hidden email]>: > Exception info: > scala> carbon.sql("create table if not exists test(a integer, b integer, c > integer) STORED BY 'carbondata'"); > org.apache.carbondata.spark.exception.MalformedCarbonCommandException: > Table > default.test can not be created without key columns. Please use > DICTIONARY_INCLUDE or DICTIONARY_EXCLUDE to set at least one key column if > all specified columns are numeric types > at > org.apache.spark.sql.catalyst.CarbonDDLSqlParser.prepareTableModel( > CarbonDDLSqlParser.scala:240) > at > org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable( > CarbonSparkSqlParser.scala:162) > at > org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable( > CarbonSparkSqlParser.scala:60) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ > CreateTableContext.accept(SqlBaseParser.java:503) > at > org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit( > AbstractParseTreeVisitor.java:42) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$ > visitSingleStatement$1.apply(AstBuilder.scala:66) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$ > visitSingleStatement$1.apply(AstBuilder.scala:66) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$. > withOrigin(ParserUtils.scala:93) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement( > AstBuilder.scala:65) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$ > anonfun$parsePlan$1.apply(ParseDriver.scala:54) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$ > anonfun$parsePlan$1.apply(ParseDriver.scala:53) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser. > parse(ParseDriver.scala:82) > at > org.apache.spark.sql.parser.CarbonSparkSqlParser.parse( > CarbonSparkSqlParser.scala:56) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser. > parsePlan(ParseDriver.scala:53) > at > org.apache.spark.sql.parser.CarbonSparkSqlParser.parsePlan( > CarbonSparkSqlParser.scala:46) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) > ... 50 elided > > I didn't notice “if all specified columns are numeric types” in exception > info. So I did more tests and found the issue only occurs when all columns > are numeric types. > > Below are cases I tested: > case 1: > carbon.sql("create table if not exists test(a string, b string, c string) > STORED BY 'carbondata' 'DICTIONARY_EXCLUDE'='a,b,c' "); > ====> ok, no dictionary column > > case 2: > carbon.sql("create table if not exists test(a integer, b integer, c > integer) > STORED BY 'carbondata'"); > ====> fail > > case 3: > carbon.sql("create tale if not exists test(a integer, b integer, c integer) > STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_INCLUDE'='a')"); > ====> ok, at least one dictionary column > > One little problem about case 2 is that there are no proper dictionary > column when all columns have high cardinality. > > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/Questions-about- > dictionary-encoded-column-and-MDK-tp9457p9484.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. > -- Regards Liang |
1. Dictionary encoding make column storage more efficient with small size
and improved search performance。 2. when search,MDK/Min-Max can be used to do block/blocklet prunning in oder to reduce IO. For now ,MDK is composed by dimensions with the oder of declared in create table statement On Thu, Mar 23, 2017 at 11:51 PM, Liang Chen <[hidden email]> wrote: > Hi > > 1.System makes MDK index for dimensions(string columns as dimensions, > numeric > columns as measures) , so you have to specify at least one dimension(string > column) for building MDK index. > > 2.You can set numeric column with DICTIONARY_INCLUDE or DICTIONARY_EXCLUDE > to > build MDK index. > For case2, you can change script like : > carbon.sql("create table if not exists test(a integer, b integer, c > integer) > STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_INCLUDE'='a')"); > > Regards > Liang > > 2017-03-23 18:39 GMT+05:30 Jin Zhou <[hidden email]>: > > > Exception info: > > scala> carbon.sql("create table if not exists test(a integer, b integer, > c > > integer) STORED BY 'carbondata'"); > > org.apache.carbondata.spark.exception.MalformedCarbonCommandException: > > Table > > default.test can not be created without key columns. Please use > > DICTIONARY_INCLUDE or DICTIONARY_EXCLUDE to set at least one key column > if > > all specified columns are numeric types > > at > > org.apache.spark.sql.catalyst.CarbonDDLSqlParser.prepareTableModel( > > CarbonDDLSqlParser.scala:240) > > at > > org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable( > > CarbonSparkSqlParser.scala:162) > > at > > org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable( > > CarbonSparkSqlParser.scala:60) > > at > > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ > > CreateTableContext.accept(SqlBaseParser.java:503) > > at > > org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit( > > AbstractParseTreeVisitor.java:42) > > at > > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$ > > visitSingleStatement$1.apply(AstBuilder.scala:66) > > at > > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$ > > visitSingleStatement$1.apply(AstBuilder.scala:66) > > at > > org.apache.spark.sql.catalyst.parser.ParserUtils$. > > withOrigin(ParserUtils.scala:93) > > at > > org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement( > > AstBuilder.scala:65) > > at > > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$ > > anonfun$parsePlan$1.apply(ParseDriver.scala:54) > > at > > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$ > > anonfun$parsePlan$1.apply(ParseDriver.scala:53) > > at > > org.apache.spark.sql.catalyst.parser.AbstractSqlParser. > > parse(ParseDriver.scala:82) > > at > > org.apache.spark.sql.parser.CarbonSparkSqlParser.parse( > > CarbonSparkSqlParser.scala:56) > > at > > org.apache.spark.sql.catalyst.parser.AbstractSqlParser. > > parsePlan(ParseDriver.scala:53) > > at > > org.apache.spark.sql.parser.CarbonSparkSqlParser.parsePlan( > > CarbonSparkSqlParser.scala:46) > > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) > > ... 50 elided > > > > I didn't notice “if all specified columns are numeric types” in exception > > info. So I did more tests and found the issue only occurs when all > columns > > are numeric types. > > > > Below are cases I tested: > > case 1: > > carbon.sql("create table if not exists test(a string, b string, c string) > > STORED BY 'carbondata' 'DICTIONARY_EXCLUDE'='a,b,c' "); > > ====> ok, no dictionary column > > > > case 2: > > carbon.sql("create table if not exists test(a integer, b integer, c > > integer) > > STORED BY 'carbondata'"); > > ====> fail > > > > case 3: > > carbon.sql("create tale if not exists test(a integer, b integer, c > integer) > > STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_INCLUDE'='a')"); > > ====> ok, at least one dictionary column > > > > One little problem about case 2 is that there are no proper dictionary > > column when all columns have high cardinality. > > > > > > > > > > -- > > View this message in context: http://apache-carbondata- > > mailing-list-archive.1130556.n5.nabble.com/Questions-about- > > dictionary-encoded-column-and-MDK-tp9457p9484.html > > Sent from the Apache CarbonData Mailing List archive mailing list archive > > at Nabble.com. > > > > > > -- > Regards > Liang > -- Best Regards _______________________________________________________________ 开阔视野 专注开发 WilliamZhu 祝海林 [hidden email] 产品事业部-基础平台-搜索&数据挖掘 手机:18601315052 MSN:[hidden email] 微博:@PrinceCharmingJ http://weibo.com/PrinceCharmingJ 地址:北京市朝阳区广顺北大街33号院1号楼福码大厦B座12层 _______________________________________________________________ http://www.csdn.net You're the One 全球最大中文IT技术社区 一切由你开始 http://www.iteye.net 程序员深度交流社区 |
Administrator
|
Hi william
Exactly! your understanding is pretty correct. And currently community is developing sort_columns feature, user can specify columns to make MDK. the PR number is 635. Invite all of you to review this pr code. Regards Liang 2017-03-26 9:15 GMT+05:30 william <[hidden email]>: > 1. Dictionary encoding make column storage more efficient with small size > and improved search performance。 > 2. when search,MDK/Min-Max can be used to do block/blocklet prunning in > oder to reduce IO. For now ,MDK is composed by dimensions with the oder of > declared in create table statement > > On Thu, Mar 23, 2017 at 11:51 PM, Liang Chen <[hidden email]> > wrote: > > > Hi > > > > 1.System makes MDK index for dimensions(string columns as dimensions, > > numeric > > columns as measures) , so you have to specify at least one > dimension(string > > column) for building MDK index. > > > > 2.You can set numeric column with DICTIONARY_INCLUDE or > DICTIONARY_EXCLUDE > > to > > build MDK index. > > For case2, you can change script like : > > carbon.sql("create table if not exists test(a integer, b integer, c > > integer) > > STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_INCLUDE'='a')"); > > > > Regards > > Liang > > > > 2017-03-23 18:39 GMT+05:30 Jin Zhou <[hidden email]>: > > > > > Exception info: > > > scala> carbon.sql("create table if not exists test(a integer, b > integer, > > c > > > integer) STORED BY 'carbondata'"); > > > org.apache.carbondata.spark.exception.MalformedCarbonCommandException: > > > Table > > > default.test can not be created without key columns. Please use > > > DICTIONARY_INCLUDE or DICTIONARY_EXCLUDE to set at least one key column > > if > > > all specified columns are numeric types > > > at > > > org.apache.spark.sql.catalyst.CarbonDDLSqlParser.prepareTableModel( > > > CarbonDDLSqlParser.scala:240) > > > at > > > org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable( > > > CarbonSparkSqlParser.scala:162) > > > at > > > org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable( > > > CarbonSparkSqlParser.scala:60) > > > at > > > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ > > > CreateTableContext.accept(SqlBaseParser.java:503) > > > at > > > org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit( > > > AbstractParseTreeVisitor.java:42) > > > at > > > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$ > > > visitSingleStatement$1.apply(AstBuilder.scala:66) > > > at > > > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$ > > > visitSingleStatement$1.apply(AstBuilder.scala:66) > > > at > > > org.apache.spark.sql.catalyst.parser.ParserUtils$. > > > withOrigin(ParserUtils.scala:93) > > > at > > > org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement( > > > AstBuilder.scala:65) > > > at > > > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$ > > > anonfun$parsePlan$1.apply(ParseDriver.scala:54) > > > at > > > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$ > > > anonfun$parsePlan$1.apply(ParseDriver.scala:53) > > > at > > > org.apache.spark.sql.catalyst.parser.AbstractSqlParser. > > > parse(ParseDriver.scala:82) > > > at > > > org.apache.spark.sql.parser.CarbonSparkSqlParser.parse( > > > CarbonSparkSqlParser.scala:56) > > > at > > > org.apache.spark.sql.catalyst.parser.AbstractSqlParser. > > > parsePlan(ParseDriver.scala:53) > > > at > > > org.apache.spark.sql.parser.CarbonSparkSqlParser.parsePlan( > > > CarbonSparkSqlParser.scala:46) > > > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) > > > ... 50 elided > > > > > > I didn't notice “if all specified columns are numeric types” in > exception > > > info. So I did more tests and found the issue only occurs when all > > columns > > > are numeric types. > > > > > > Below are cases I tested: > > > case 1: > > > carbon.sql("create table if not exists test(a string, b string, c > string) > > > STORED BY 'carbondata' 'DICTIONARY_EXCLUDE'='a,b,c' "); > > > ====> ok, no dictionary column > > > > > > case 2: > > > carbon.sql("create table if not exists test(a integer, b integer, c > > > integer) > > > STORED BY 'carbondata'"); > > > ====> fail > > > > > > case 3: > > > carbon.sql("create tale if not exists test(a integer, b integer, c > > integer) > > > STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_INCLUDE'='a')"); > > > ====> ok, at least one dictionary column > > > > > > One little problem about case 2 is that there are no proper dictionary > > > column when all columns have high cardinality. > > > > > > > > > > > > > > > -- > > > View this message in context: http://apache-carbondata- > > > mailing-list-archive.1130556.n5.nabble.com/Questions-about- > > > dictionary-encoded-column-and-MDK-tp9457p9484.html > > > Sent from the Apache CarbonData Mailing List archive mailing list > archive > > > at Nabble.com. > > > > > > > > > > > -- > > Regards > > Liang > > > > > > -- > Best Regards > _______________________________________________________________ > 开阔视野 专注开发 > WilliamZhu 祝海林 [hidden email] > 产品事业部-基础平台-搜索&数据挖掘 > 手机:18601315052 > MSN:[hidden email] > 微博:@PrinceCharmingJ http://weibo.com/PrinceCharmingJ > 地址:北京市朝阳区广顺北大街33号院1号楼福码大厦B座12层 > _______________________________________________________________ > http://www.csdn.net You're the One > 全球最大中文IT技术社区 一切由你开始 > > http://www.iteye.net > 程序员深度交流社区 > -- Regards Liang |
Free forum by Nabble | Edit this page |