Hi Dev,
Complex types (also referred to as nested types) let you represent multiple data values within a single row/column position. CarbonData already has the support of Complex Types but it lacks major enhancements which are present in other primitive Datatypes. As complex type usages are increasing, we are planning to enhance the coverage of Complex Types and apply some major optimization. I am listing down few of the optimization which we have thought off. Request to the community to go through the listing and please give your valuable suggestions. 1. Adaptive Encoding for Complex Type Page: Currently Complex Types page doesn't have any encoding present, which leads to higher IO compared to other DataTypes. Complex Page should be at par with other datatypes encoding mechanism. 2. Optimize Array Type Reading: Optimizing Complex Type Array reading so that it can be read faster. One of the ways is to reduce the Read IO for Arrays after applying encoding mechanism like Adaptive or RLE on the Array data type. 3. Filter and Projection Push Down for Complex Datatypes: As of now in case of Complex DataTypes filters and projections are handled in the upper spark layer. In case they are pushed down Carbon will get better performance as less IO will incur as all rows need not be send back to spark for processing. 4. Support Multilevel Nesting in Complex Datatypes: Only 2 Level of nesting is supported for Complex Datatype through Load and Insert into. Make this to n-level support. 5. Update and Delete support for complex Datatype: Currently, only primitive datatypes work for Update and Delete in CarbonData. Support Complex DataType too for the DML operation. 6. Alter Table Support for Complex DataType : Alter table doesn't support addition or deletion of complex columns as of now. This support needs to be extended. 7. Map Datatype Support: Only Struct and Array datatypes are part of Complex Datatype as of now. Map Datatype should be extended as part of Complex. 8. Compaction support for Complex Datatype: Compaction works for the primitive datatype, but should be extended for complex too. Good to have features ------------------------------ 9. Geospatial Support through Complex Datatype: Geospatial datatypes like ST_GEOMETRY and XMLs object representation through complex datatypes. 10. Complex Datatype Transformation: Once complex datatype can transform into different complex datatype. For e.g. User Inserted Data with ComplexA datatype but want to transform the data and retrieve the data like ComplexB datatype. 11. Virtual Tables for Complex Datatypes: Currently complex columns reside in one column, but through virtual tables, the complex columns an be denormalized and placed into a separate table called a virtual table for faster processing and joins and applying to sort columns. 12. Including Complex Datatype to Sort Columns. Please let me know your suggestion on these enhancements. Thanks a lot -- Thanks Sounak |
Hi Dev,
We have identified the scope of phase1 activities for complex type enhancements. Below are the phase 1 enhancement activities. - Predicate push down for struct data type. - Provide adaptive encoding and decoding for all data type. - Support JSON data loading directly into Carbon table. Please find the detail design document attached in the JIRA [CARBONDATA-2605 ] https://issues.apache.org/jira/browse/CARBONDATA-2605 Thanks, Sounak On Mon, Jun 4, 2018 at 8:10 AM sounak <[hidden email]> wrote: > Hi Dev, > > Complex types (also referred to as nested types) let you represent > multiple data values within a single row/column position. > CarbonData already has the support of Complex Types but it lacks major > enhancements which are present in other primitive Datatypes. As complex > type usages are increasing, we are planning to enhance the coverage of > Complex Types and apply some major optimization. I am listing down few of > the optimization which we have thought off. > > Request to the community to go through the listing and please give your > valuable suggestions. > > 1. Adaptive Encoding for Complex Type Page: Currently Complex Types > page doesn't have any encoding present, which leads to higher IO compared > to other DataTypes. Complex Page should be at par with other datatypes > encoding mechanism. > > 2. Optimize Array Type Reading: Optimizing Complex Type Array reading so > that it can be read faster. One of the ways is to reduce the Read IO for > Arrays after applying encoding mechanism like Adaptive or RLE on the Array > data type. > > 3. Filter and Projection Push Down for Complex Datatypes: As of now in > case of Complex DataTypes filters and projections are handled in the upper > spark layer. In case they are pushed down Carbon will get better > performance as less IO will incur as all rows need not be send back to > spark for processing. > > 4. Support Multilevel Nesting in Complex Datatypes: Only 2 Level of > nesting is supported for Complex Datatype through Load and Insert into. > Make this to n-level support. > > 5. Update and Delete support for complex Datatype: Currently, only > primitive datatypes work for Update and Delete in CarbonData. Support > Complex DataType too for the DML operation. > > 6. Alter Table Support for Complex DataType : Alter table doesn't support > addition or deletion of complex columns as of now. This support needs to be > extended. > > 7. Map Datatype Support: Only Struct and Array datatypes are part of > Complex Datatype as of now. Map Datatype should be extended as part of > Complex. > > 8. Compaction support for Complex Datatype: Compaction works for the > primitive datatype, but should be extended for complex too. > > > Good to have features > ------------------------------ > 9. Geospatial Support through Complex Datatype: Geospatial datatypes like > ST_GEOMETRY and XMLs object representation through complex datatypes. > > 10. Complex Datatype Transformation: Once complex datatype can transform > into different complex datatype. For e.g. User Inserted Data with ComplexA > datatype but want to transform the data and retrieve the data like ComplexB > datatype. > > 11. Virtual Tables for Complex Datatypes: Currently complex columns reside > in one column, but through virtual tables, the complex columns an be > denormalized and placed into a separate table called a virtual table for > faster processing and joins and applying to sort columns. > > 12. Including Complex Datatype to Sort Columns. > > Please let me know your suggestion on these enhancements. > > Thanks a lot > > -- > Thanks > Sounak > -- Thanks Sounak |
In reply to this post by sounak
1. Would complex data types be supported for Table with partition ?
2. Would the complex data types be able to be used as dictionary columns ? Also suppose in array of struct if we have both dimensions and measures inside a single data cell would it behave as dictionary or not ? 3. Would streaming table support complex types ? 4. Would preaggregate table and other datamaps be able to be created using complex type columns ? 5. Can it be assumed that features such as CTAS and local dictionary would support complex types. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by sounak
1. In case of insert into a table how will it support the complex data types
in the insert command. 2. If dictionary support is allowed for complex type how will be the structure of the dictionary file created in the metadata folder. Suppose the data type is struct which has 4 items, so when we use it as Dictionary will it create separate dictionary meta files for each item or a single file. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi Surbhi,
(1) in case of insert into statement, complex data type will behave the same way as other data types. only thing the user has to be careful about is the delimiter of the data. Complex data will use the default delimiters. As we are supporting only 2 levels, the default level 1 and level 2 delimiters will be used. Please check the product document for already supporting level 1 and level 2 delimiters for complex data types. (2) We are supporting complex data types as No dictionary column only. So no need to discuss on dictionary part now. Thanks, Dhatchayani -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by sounak
Hi Sounak,
Are you planning to do predicate pushdown or projection push down for struct type? I guess adaptive encoding is only possible for integral datatypes like long, int, short not for all datatypes. So please be list down what type of encoding you are planning on complex types. Regards, Ravindra. On Wed, 13 Jun 2018 at 20:07, sounak <[hidden email]> wrote: > Hi Dev, > > We have identified the scope of phase1 activities for complex type > enhancements. > > Below are the phase 1 enhancement activities. > > - Predicate push down for struct data type. > - Provide adaptive encoding and decoding for all data type. > - Support JSON data loading directly into Carbon table. > > > Please find the detail design document attached in the JIRA > [CARBONDATA-2605 > ] > https://issues.apache.org/jira/browse/CARBONDATA-2605 > > Thanks, > Sounak > > > > > > On Mon, Jun 4, 2018 at 8:10 AM sounak <[hidden email]> wrote: > > > Hi Dev, > > > > Complex types (also referred to as nested types) let you represent > > multiple data values within a single row/column position. > > CarbonData already has the support of Complex Types but it lacks major > > enhancements which are present in other primitive Datatypes. As complex > > type usages are increasing, we are planning to enhance the coverage of > > Complex Types and apply some major optimization. I am listing down few of > > the optimization which we have thought off. > > > > Request to the community to go through the listing and please give your > > valuable suggestions. > > > > 1. Adaptive Encoding for Complex Type Page: Currently Complex Types > > page doesn't have any encoding present, which leads to higher IO compared > > to other DataTypes. Complex Page should be at par with other datatypes > > encoding mechanism. > > > > 2. Optimize Array Type Reading: Optimizing Complex Type Array reading so > > that it can be read faster. One of the ways is to reduce the Read IO for > > Arrays after applying encoding mechanism like Adaptive or RLE on the > Array > > data type. > > > > 3. Filter and Projection Push Down for Complex Datatypes: As of now in > > case of Complex DataTypes filters and projections are handled in the > upper > > spark layer. In case they are pushed down Carbon will get better > > performance as less IO will incur as all rows need not be send back to > > spark for processing. > > > > 4. Support Multilevel Nesting in Complex Datatypes: Only 2 Level of > > nesting is supported for Complex Datatype through Load and Insert into. > > Make this to n-level support. > > > > 5. Update and Delete support for complex Datatype: Currently, only > > primitive datatypes work for Update and Delete in CarbonData. Support > > Complex DataType too for the DML operation. > > > > 6. Alter Table Support for Complex DataType : Alter table doesn't support > > addition or deletion of complex columns as of now. This support needs to > be > > extended. > > > > 7. Map Datatype Support: Only Struct and Array datatypes are part of > > Complex Datatype as of now. Map Datatype should be extended as part of > > Complex. > > > > 8. Compaction support for Complex Datatype: Compaction works for the > > primitive datatype, but should be extended for complex too. > > > > > > Good to have features > > ------------------------------ > > 9. Geospatial Support through Complex Datatype: Geospatial datatypes like > > ST_GEOMETRY and XMLs object representation through complex datatypes. > > > > 10. Complex Datatype Transformation: Once complex datatype can transform > > into different complex datatype. For e.g. User Inserted Data with > ComplexA > > datatype but want to transform the data and retrieve the data like > ComplexB > > datatype. > > > > 11. Virtual Tables for Complex Datatypes: Currently complex columns > reside > > in one column, but through virtual tables, the complex columns an be > > denormalized and placed into a separate table called a virtual table for > > faster processing and joins and applying to sort columns. > > > > 12. Including Complex Datatype to Sort Columns. > > > > Please let me know your suggestion on these enhancements. > > > > Thanks a lot > > > > -- > > Thanks > > Sounak > > > > > -- > Thanks > Sounak > -- Thanks & Regards, Ravi |
Hi Ravindra,
Only projection push down for Struct type is planned. For complex data type, encoding is planned for the primitive types. For both integral and dimension types, we are reusing the existing encoding types. For Integral types, it will be reusing adaptive and adaptive delta encoding. For dimension types, createEncoderForDimensionLegacy will be used to get the encoder. Thanks, Dhatchayani -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by sounak
Hi Dev,
In Complex type enhancements, i have started the below point. 8. Compaction support for Complex Datatype: Compaction works for the primitive datatype, but should be extended for complex too. Please find the JIRA Link below: https://issues.apache.org/jira/browse/CARBONDATA-2755 -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |