Login  Register

Re: [Feature ]Design Document for Update/Delete support in CarbonData

Posted by Aniket Adnaik on Nov 21, 2016; 7:37am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Feature-Design-Document-for-Update-Delete-support-in-CarbonData-tp3043p3057.html

Hi Sujith,

Please see my comments inline.

Best Regards,
Aniket

On Sun, Nov 20, 2016 at 9:11 PM, sujith chacko <[hidden email]>
wrote:

> Hi Aniket,
>
>       Its a well documented design,  just want to know few points like
>
> a.  Format of the RowID and its datatype
>
 AA>> Following format can be used to represent a unique rowed;

 [<Segment ID><Block ID><Blocklet ID><Offset in Blocklet>]
 A simple way would be to use String data type and store it as a text file.
However, more efficient way could be to use Bitsets/Bitmaps as further
optimization. Compressed Bitmaps such as Roaring bitmaps can be used for
better performance and efficient storage.

b.  Impact of this feature in select query since every time query process
has to exclude each deleted records and include corresponding updated
record, any optimization is considered in tackling the query performance
issue since one of the major highlights of carbon is performance.
AA>> Some of the optimizations would be  to cache the deltas to avoid
recurrent I/O,
to store sorted rowids in delete delta for efficient lookup, and perform
regular compaction to minimize the impact on select query performance.
Additionally, we may have to explore ways to perform compaction
automatically, for example, if more than 25% of rows are read from deltas.
Please feel free to share if you have any ideas or suggestions.

Thanks,
Sujith

On Nov 20, 2016 9:24 PM, "Aniket Adnaik" <[hidden email]> wrote:

> Hi All,
>
> Please find a design doc for Update/Delete support in CarbonData.
>
> https://drive.google.com/file/d/0B71_EuXTdDi8S2dxVjN6Z1RhWlU/view?
> usp=sharing
>
> Best Regards,
> Aniket
>