Apache CarbonData Dev Mailing List archive - Re: Discussion regrading design of data load after kettle removal.

Apache CarbonData Dev Mailing List archive

Re: Discussion regrading design of data load after kettle removal.

Posted by ravipesala on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-regrading-design-of-data-load-after-kettle-removal-tp1672p1730.html

Hi Jacky,

https://drive.google.com/open?id=0B4TWTVbFSTnqeElyWko5NDlBZkdxS3NrMW1PZndzMG5ZM2Y0

1. Yes it calls child step to execute and apply its logic to return
iterator just like spark sql. For CarbonOutputFormat it will use
RecordBufferedWriterIterator and collects the data in batches.
https://drive.google.com/open?id=0B4TWTVbFSTnqTF85anlDOUQ5S1BqYzFpLWcwZnBLSVVqSWpj

2. Yes,this interface relies on processing row by row. But we can also
execute in batches in iterator.

3.Yes, dictionary interface is used for reading dictionary while querying.
Ok based on my understanding I have added this interface, we can discuss
more on it and update the interface.

Regards,
Ravi

On 10 October 2016 at 14:56, Jacky Li <[hidden email]> wrote:

> Hi Ravindra,
>
> I have following questions:
>
> 1. How does DataLoadProcessorStep inteface work? For each step, it will
> call
> its child step to execute and apply its logic to the returned iterator of
> the child? And how does it map to OutputFormat in hadoop interface?
>
> 2. This step interface relies on iterator to do the encoding row by row,
> will it be convinient to add batch encoder support now or later?
>
> 3. for the ditionary part, besides generator I think it is better also
> considering the interface for the reading of dictionary while querying. Are
> you planning to use the same interface? If so, it is not just a Generator.
> If the dictionary interface is well designed, other developer can also add
> new dictionary type. For example:
> - based on usage frequency to assign dictionary value, for better
> compression, similar to huffman encoding
> - order-preserving dictionary which can do range filter on dictionary value
> directly
>
> Regards,
> Jacky
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Discussion-
> regrading-design-of-data-load-after-kettle-removal-tp1672p1726.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>

--
Thanks & Regards,
Ravi