Login  Register

Re: Grammar about supporting string longer than 32000 characters

Posted by ravipesala on May 02, 2018; 3:38pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Grammar-about-supporting-string-longer-than-32000-characters-tp47731p47892.html

Hi,

I agree with option 2 but not new datatype use varchar(size).
There are more optimizations we can do with varchar(size) datatype like
1. if the size is smaller (less than 8 bytes)  then we can write in fixed
length encoder instead of  LV encode it can save a lot of space and memory.
2. If the size is less than 32000 then use current our string datatype.
3. If size is more than 32000 then encode using int as a length in LV
format.

In spark dataframe support we can by default use string as datatype.

Even if we take option 1 also carbon should internally has new datatype
otherwise code will not be good as you need to check this property many
places so ideally new datatype can lead to a new set of implementations and
easier to code and maintain.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/