Re: Complex DataType Enhancements
Posted by
sounak on
Jun 13, 2018; 2:36pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Complex-DataType-Enhancements-tp51380p51875.html
Hi Dev,
We have identified the scope of phase1 activities for complex type
enhancements.
Below are the phase 1 enhancement activities.
- Predicate push down for struct data type.
- Provide adaptive encoding and decoding for all data type.
- Support JSON data loading directly into Carbon table.
Please find the detail design document attached in the JIRA [CARBONDATA-2605
]
https://issues.apache.org/jira/browse/CARBONDATA-2605Thanks,
Sounak
On Mon, Jun 4, 2018 at 8:10 AM sounak <
[hidden email]> wrote:
> Hi Dev,
>
> Complex types (also referred to as nested types) let you represent
> multiple data values within a single row/column position.
> CarbonData already has the support of Complex Types but it lacks major
> enhancements which are present in other primitive Datatypes. As complex
> type usages are increasing, we are planning to enhance the coverage of
> Complex Types and apply some major optimization. I am listing down few of
> the optimization which we have thought off.
>
> Request to the community to go through the listing and please give your
> valuable suggestions.
>
> 1. Adaptive Encoding for Complex Type Page: Currently Complex Types
> page doesn't have any encoding present, which leads to higher IO compared
> to other DataTypes. Complex Page should be at par with other datatypes
> encoding mechanism.
>
> 2. Optimize Array Type Reading: Optimizing Complex Type Array reading so
> that it can be read faster. One of the ways is to reduce the Read IO for
> Arrays after applying encoding mechanism like Adaptive or RLE on the Array
> data type.
>
> 3. Filter and Projection Push Down for Complex Datatypes: As of now in
> case of Complex DataTypes filters and projections are handled in the upper
> spark layer. In case they are pushed down Carbon will get better
> performance as less IO will incur as all rows need not be send back to
> spark for processing.
>
> 4. Support Multilevel Nesting in Complex Datatypes: Only 2 Level of
> nesting is supported for Complex Datatype through Load and Insert into.
> Make this to n-level support.
>
> 5. Update and Delete support for complex Datatype: Currently, only
> primitive datatypes work for Update and Delete in CarbonData. Support
> Complex DataType too for the DML operation.
>
> 6. Alter Table Support for Complex DataType : Alter table doesn't support
> addition or deletion of complex columns as of now. This support needs to be
> extended.
>
> 7. Map Datatype Support: Only Struct and Array datatypes are part of
> Complex Datatype as of now. Map Datatype should be extended as part of
> Complex.
>
> 8. Compaction support for Complex Datatype: Compaction works for the
> primitive datatype, but should be extended for complex too.
>
>
> Good to have features
> ------------------------------
> 9. Geospatial Support through Complex Datatype: Geospatial datatypes like
> ST_GEOMETRY and XMLs object representation through complex datatypes.
>
> 10. Complex Datatype Transformation: Once complex datatype can transform
> into different complex datatype. For e.g. User Inserted Data with ComplexA
> datatype but want to transform the data and retrieve the data like ComplexB
> datatype.
>
> 11. Virtual Tables for Complex Datatypes: Currently complex columns reside
> in one column, but through virtual tables, the complex columns an be
> denormalized and placed into a separate table called a virtual table for
> faster processing and joins and applying to sort columns.
>
> 12. Including Complex Datatype to Sort Columns.
>
> Please let me know your suggestion on these enhancements.
>
> Thanks a lot
>
> --
> Thanks
> Sounak
>
--
Thanks
Sounak