Hi All,
This discussion is regarding support for Map Data type in Carbon Data. Carbon Data supports complex and nested data types such as Arrays and Struts. However, Carbon Data does not support other complex data types such as Maps and Union which are generally supported by popular opensource file formats. Supporting Map data type will require changes/additions to the DDL, Query Syntax, Data Loading and Storage. I have hosted the design on google docs for review and discussion. https://docs.google.com/document/d/1U6wPohvdDHk0B7bONnVHWa6PKG8R9q5-oKMqzMMQHYY/edit?usp=sharing Below is the same inline. 1. DDL Changes Maps are key->value data types and where the value can be fetched by providing the key. Hence we need to restrict keys to primitive data types whereas values can be of any data type supported in Carbon(primitive and complex). Map data types can be defined in the create table DDL as :- “MAP<primitive_data_type, data_type>” For Example:- create table example_table (id Int, name String, salary Int, salary_breakup map<String, Int>, city String) 2. Data Loading Changes Carbon should be able to support loading data into tables with Map type columns from csv files. It should be possible to represent maps in a single row of csv. This will need carbon to support specifying the delimiters for :- 1. Between two Key-Value pairs 2. Between each Key and Value in a pair As Carbon already supports Strut and Array Complex types, the data loading process already provides support for defining delimiters for complex data types. Carbon provides two Optional parameters for data loading 1. COMPLEX_DELIMITER_LEVEL_1: will define the delimiter between two Key-Value pairs OPTIONS('COMPLEX_DELIMITER_LEVEL_1'='$') 2. COMPLEX_DELIMITER_LEVEL_2: will define the delimiter between each Key and Value in a pair OPTIONS('COMPLEX_DELIMITER_LEVEL_2'=':') With these delimiter options, the below map can be represented in csv:- Fixed->100,000 Bonus->30,000 Stock->40,000 As Fixed:100,000$Bonus:30,000$Stock:40,000 in the csv file. 3. Query Capabilities A complex datatype like Map will require additional operators to be supported in the query language to fully utilize the strength of the data type. Maps are sequence of key-value pairs, hence should support looking up value for a given key. Users could use the ColumnName[“key”] syntax to lookup values in a map column. For example: salary_breakup[“Fixed”] could be used to fetch only the Fixed component in the salary breakup. In Addition, we also need to define how maps can be used in existing constructs such as select, where(filter), group by etc.. 1. Select:- Map data type can be directly selected or only the value for a given key can be selected as per the requirement. For example:-“Select name, salary, salary_breakup” will return the content of map long with each row.“Select name, salary, salary_breakup[“Fixed”]” will return only one value from the map whose key is “Fixed”2. Filter:-Map data type cannot be directly used in a where clause as where clause can operate only on primitive data types. However the map lookup operator can be used in where clauses. For example:-“Select name, salary where salary_breakup[“Bonus”]>10,000”*Note: if the value is not of primitive type, further assessor operators need to be used depending on the type of value to arrive at a primitive type for the filter expression to be valid.* 3. Group By:- Just like with filters, maps cannot be directly used in a group by clause, however the lookup operator can be used. 4. Functions:- A size() function can be provided for map types to determine the number of key-value pairs in a map. 4. Storage changes As Carbon is a columnar data store, Map values will be stored using 3 physical columns 1. One Column for representing the Map Data type. Will store the number of fields and start index, just the same way as it is done for Struts and Arrays. 2. One Column for the Key 3. One Column for the value, if the value is of primitive data type, else the value itself will be multiple physical columns depending on the data type of the value. Map<String,Int> Column_1 Column_2 Column_3 Map_Salary_Breakup Map_Salary_Breakup.key Map_Salary_Breakup.value 3,1 Fixed 1,00,000 Bonus 30,000 Stock 40,000 2,4 Fixed 1,40,000 Bonus 30,000 3,6 Fixed 1,20,000 Bonus 20,000 Stock 30,000 Regards Vimal |
Administrator
|
Hi Vimal
Thank you started the discussion. For keys of Map data only can be primitive, can you list these type which will be supported? (Int,String,Double.. For discussing more conveniently, you can go ahead to use google docs. After the design document finalized , please archive and upload it to cwiki:https://cwiki.apache.org/confluence/display/CARBONDATA/CarbonData+Home Regards Liang
|
Hi Vimal,
Design doc looks clear, can you also add file format storage design for map datatype. Regards, Ravi. On 17 October 2016 at 07:43, Liang Chen <[hidden email]> wrote: > Hi Vimal > > Thank you started the discussion. > For keys of Map data only can be primitive, can you list these type which > will be supported? (Int,String,Double.. > > For discussing more conveniently, you can go ahead to use google docs. > After the design document finalized , please archive and upload it to > cwiki:https://cwiki.apache.org/confluence/display/ > CARBONDATA/CarbonData+Home > > Regards > Liang > > > Vimal Das Kammath wrote > > Hi All, > > > > This discussion is regarding support for Map Data type in Carbon Data. > > > > Carbon Data supports complex and nested data types such as Arrays and > > Struts. However, Carbon Data does not support other complex data types > > such > > as Maps and Union which are generally supported by popular opensource > file > > formats. > > > > > > Supporting Map data type will require changes/additions to the DDL, Query > > Syntax, Data Loading and Storage. > > > > > > I have hosted the design on google docs for review and discussion. > > > > https://docs.google.com/document/d/1U6wPohvdDHk0B7bONnVHWa6PKG8R9 > q5-oKMqzMMQHYY/edit?usp=sharing > > > > > > Below is the same inline. > > > > > > 1. DDL Changes > > > > Maps are key->value data types and where the value can be fetched by > > providing the key. Hence we need to restrict keys to primitive data types > > whereas values can be of any data type supported in Carbon(primitive and > > complex). > > > > Map data types can be defined in the create table DDL as :- > > > > “MAP<primitive_data_type, data_type>” > > > > For Example:- > > > > create table example_table (id Int, name String, salary Int, > > salary_breakup > > map<String, Int>, city String) > > > > > > 2. Data Loading Changes > > > > Carbon should be able to support loading data into tables with Map type > > columns from csv files. It should be possible to represent maps in a > > single > > row of csv. This will need carbon to support specifying the delimiters > for > > :- > > > > 1. Between two Key-Value pairs > > > > 2. Between each Key and Value in a pair > > > > As Carbon already supports Strut and Array Complex types, the data > loading > > process already provides support for defining delimiters for complex data > > types. Carbon provides two Optional parameters for data loading > > > > 1. COMPLEX_DELIMITER_LEVEL_1: will define the delimiter between two > > Key-Value pairs > > > > OPTIONS('COMPLEX_DELIMITER_LEVEL_1'='$') > > > > 2. COMPLEX_DELIMITER_LEVEL_2: will define the delimiter between each > > Key and Value in a pair > > > > OPTIONS('COMPLEX_DELIMITER_LEVEL_2'=':') > > > > With these delimiter options, the below map can be represented in csv:- > > > > Fixed->100,000 > > > > Bonus->30,000 > > > > Stock->40,000 > > > > As > > > > Fixed:100,000$Bonus:30,000$Stock:40,000 in the csv file. > > > > > > > > 3. Query Capabilities > > > > A complex datatype like Map will require additional operators to be > > supported in the query language to fully utilize the strength of the data > > type. > > > > Maps are sequence of key-value pairs, hence should support looking up > > value > > for a given key. Users could use the ColumnName[“key”] syntax to lookup > > values in a map column. For example: salary_breakup[“Fixed”] could be > used > > to fetch only the Fixed component in the salary breakup. > > > > In Addition, we also need to define how maps can be used in existing > > constructs such as select, where(filter), group by etc.. > > 1. Select:- Map data type can be directly selected or only the value > > for a given key can be selected as per the requirement. For > > example:-“Select > > name, salary, salary_breakup” will return the content of map long with > > each > > row.“Select name, salary, salary_breakup[“Fixed”]” will return only one > > value from the map whose key is “Fixed”2. Filter:-Map data type > cannot > > be directly used in a where clause as where clause can operate only on > > primitive data types. However the map lookup operator can be used in > where > > clauses. For example:-“Select name, salary where > > salary_breakup[“Bonus”]>10,000”*Note: if the value is not of primitive > > type, further assessor operators need to be used depending on the type of > > value to arrive at a primitive type for the filter expression to be > > valid.* > > 3. Group By:- Just like with filters, maps cannot be directly used in > > a > > group by clause, however the lookup operator can be used. > > > > 4. Functions:- A size() function can be provided for map types to > > determine the number of key-value pairs in a map. > > 4. Storage changes > > > > As Carbon is a columnar data store, Map values will be stored using 3 > > physical columns > > > > 1. One Column for representing the Map Data type. Will store the > > number > > of fields and start index, just the same way as it is done for Struts and > > Arrays. > > > > 2. One Column for the Key > > > > 3. One Column for the value, if the value is of primitive data type, > > else the value itself will be multiple physical columns depending on the > > data type of the value. > > > > Map<String,Int> > > > > Column_1 > > > > Column_2 > > > > Column_3 > > > > Map_Salary_Breakup > > > > Map_Salary_Breakup.key > > > > Map_Salary_Breakup.value > > > > 3,1 > > > > Fixed > > > > 1,00,000 > > > > Bonus > > > > 30,000 > > > > Stock > > > > 40,000 > > > > 2,4 > > > > Fixed > > > > 1,40,000 > > > > Bonus > > > > 30,000 > > > > 3,6 > > > > Fixed > > > > 1,20,000 > > > > Bonus > > > > 20,000 > > > > Stock > > > > 30,000 > > > > Regards > > Vimal > > > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/Discussion-New- > feature-Support-Complex-Data-Type-Map-in-Carbon-Data-tp1969p1985.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. > -- Thanks & Regards, Ravi |
In reply to this post by Liang Chen
The key in the map can be only primitive data types. At present, Carbon
Data supports following primitive data types Integer, String, Timestamp, Double and Decimal. If in future CarbonData adds supports more primitive data types, the same can be used as key in the Map. The reason for restricting the keys to primitive data types is that,if keys were complex data types then lookup using key in the query will not be possible in the SQL statement. On Mon, Oct 17, 2016 at 7:43 AM, Liang Chen <[hidden email]> wrote: > Hi Vimal > > Thank you started the discussion. > For keys of Map data only can be primitive, can you list these type which > will be supported? (Int,String,Double.. > > For discussing more conveniently, you can go ahead to use google docs. > After the design document finalized , please archive and upload it to > cwiki:https://cwiki.apache.org/confluence/display/ > CARBONDATA/CarbonData+Home > > Regards > Liang > > > Vimal Das Kammath wrote > > Hi All, > > > > This discussion is regarding support for Map Data type in Carbon Data. > > > > Carbon Data supports complex and nested data types such as Arrays and > > Struts. However, Carbon Data does not support other complex data types > > such > > as Maps and Union which are generally supported by popular opensource > file > > formats. > > > > > > Supporting Map data type will require changes/additions to the DDL, Query > > Syntax, Data Loading and Storage. > > > > > > I have hosted the design on google docs for review and discussion. > > > > https://docs.google.com/document/d/1U6wPohvdDHk0B7bONnVHWa6PKG8R9 > q5-oKMqzMMQHYY/edit?usp=sharing > > > > > > Below is the same inline. > > > > > > 1. DDL Changes > > > > Maps are key->value data types and where the value can be fetched by > > providing the key. Hence we need to restrict keys to primitive data types > > whereas values can be of any data type supported in Carbon(primitive and > > complex). > > > > Map data types can be defined in the create table DDL as :- > > > > “MAP<primitive_data_type, data_type>” > > > > For Example:- > > > > create table example_table (id Int, name String, salary Int, > > salary_breakup > > map<String, Int>, city String) > > > > > > 2. Data Loading Changes > > > > Carbon should be able to support loading data into tables with Map type > > columns from csv files. It should be possible to represent maps in a > > single > > row of csv. This will need carbon to support specifying the delimiters > for > > :- > > > > 1. Between two Key-Value pairs > > > > 2. Between each Key and Value in a pair > > > > As Carbon already supports Strut and Array Complex types, the data > loading > > process already provides support for defining delimiters for complex data > > types. Carbon provides two Optional parameters for data loading > > > > 1. COMPLEX_DELIMITER_LEVEL_1: will define the delimiter between two > > Key-Value pairs > > > > OPTIONS('COMPLEX_DELIMITER_LEVEL_1'='$') > > > > 2. COMPLEX_DELIMITER_LEVEL_2: will define the delimiter between each > > Key and Value in a pair > > > > OPTIONS('COMPLEX_DELIMITER_LEVEL_2'=':') > > > > With these delimiter options, the below map can be represented in csv:- > > > > Fixed->100,000 > > > > Bonus->30,000 > > > > Stock->40,000 > > > > As > > > > Fixed:100,000$Bonus:30,000$Stock:40,000 in the csv file. > > > > > > > > 3. Query Capabilities > > > > A complex datatype like Map will require additional operators to be > > supported in the query language to fully utilize the strength of the data > > type. > > > > Maps are sequence of key-value pairs, hence should support looking up > > value > > for a given key. Users could use the ColumnName[“key”] syntax to lookup > > values in a map column. For example: salary_breakup[“Fixed”] could be > used > > to fetch only the Fixed component in the salary breakup. > > > > In Addition, we also need to define how maps can be used in existing > > constructs such as select, where(filter), group by etc.. > > 1. Select:- Map data type can be directly selected or only the value > > for a given key can be selected as per the requirement. For > > example:-“Select > > name, salary, salary_breakup” will return the content of map long with > > each > > row.“Select name, salary, salary_breakup[“Fixed”]” will return only one > > value from the map whose key is “Fixed”2. Filter:-Map data type > cannot > > be directly used in a where clause as where clause can operate only on > > primitive data types. However the map lookup operator can be used in > where > > clauses. For example:-“Select name, salary where > > salary_breakup[“Bonus”]>10,000”*Note: if the value is not of primitive > > type, further assessor operators need to be used depending on the type of > > value to arrive at a primitive type for the filter expression to be > > valid.* > > 3. Group By:- Just like with filters, maps cannot be directly used in > > a > > group by clause, however the lookup operator can be used. > > > > 4. Functions:- A size() function can be provided for map types to > > determine the number of key-value pairs in a map. > > 4. Storage changes > > > > As Carbon is a columnar data store, Map values will be stored using 3 > > physical columns > > > > 1. One Column for representing the Map Data type. Will store the > > number > > of fields and start index, just the same way as it is done for Struts and > > Arrays. > > > > 2. One Column for the Key > > > > 3. One Column for the value, if the value is of primitive data type, > > else the value itself will be multiple physical columns depending on the > > data type of the value. > > > > Map<String,Int> > > > > Column_1 > > > > Column_2 > > > > Column_3 > > > > Map_Salary_Breakup > > > > Map_Salary_Breakup.key > > > > Map_Salary_Breakup.value > > > > 3,1 > > > > Fixed > > > > 1,00,000 > > > > Bonus > > > > 30,000 > > > > Stock > > > > 40,000 > > > > 2,4 > > > > Fixed > > > > 1,40,000 > > > > Bonus > > > > 30,000 > > > > 3,6 > > > > Fixed > > > > 1,20,000 > > > > Bonus > > > > 20,000 > > > > Stock > > > > 30,000 > > > > Regards > > Vimal > > > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/Discussion-New- > feature-Support-Complex-Data-Type-Map-in-Carbon-Data-tp1969p1985.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. > |
Free forum by Nabble | Edit this page |