Apache CarbonData Dev Mailing List archive - Re: Discussion regrading design of data load after kettle removal.

Apache CarbonData Dev Mailing List archive

Re: Discussion regrading design of data load after kettle removal.

Posted by Jacky Li on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-regrading-design-of-data-load-after-kettle-removal-tp1672p1726.html

Hi Ravindra,

I have following questions:

1. How does DataLoadProcessorStep inteface work? For each step, it will call its child step to execute and apply its logic to the returned iterator of the child? And how does it map to OutputFormat in hadoop interface?

2. This step interface relies on iterator to do the encoding row by row, will it be convinient to add batch encoder support now or later?

3. for the ditionary part, besides generator I think it is better also considering the interface for the reading of dictionary while querying. Are you planning to use the same interface? If so, it is not just a Generator.
If the dictionary interface is well designed, other developer can also add new dictionary type. For example:
- based on usage frequency to assign dictionary value, for better compression, similar to huffman encoding
- order-preserving dictionary which can do range filter on dictionary value directly

Regards,
Jacky