http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Parsing-values-during-data-load-should-adopt-a-strict-check-or-lenient-check-mechanism-tp3826p3899.html
+1 for strict check mechanism.
columns.
> Hi
>
> Thank you started a good discussion.
>
> I propose to do strict check mechanism to avoid these problems what you
> mentioned in the below.
> And the behavior should be same for both dimensions and measures. In a word
> , need to process the actual data type as per users input.
>
> Regards
> Liang
>
>
> manishgupta88 wrote
> > Hi All,
> >
> > Currently in carbon we treat Short and Int as long and at the time of
> > storing in carbon data files delta compression is used which compresses
> > the
> > data based on min and max values of the column.
> >
> > While parsing the values for these datatypes, we use Double data type
> > parser and extract long value from that. Code snippet as below.
> > Double.valueOf(msrValue).longValue()
> >
> > This has the following problems.
> >
> > 1. Measure Values beyond the range of Int and Short are parsed
> > successfully. This behavior conflicts when the same measure is included
> as
> > dictionary_include and becomes a dimension. When we query then each
> > dimension value is parsed for its datatype for result conversion and at
> > that time NumberFormatException is thrown and null is displayed in the
> > result while for measure the loaded values are displayed. This also
> > impacts
> > aggregate queries. That is why strict check mechanism is adopted for
> > dimensions values parsing.
> >
> > 2. Data inconsistency in case of measures as for decimal values, the
> > value
> > before decimal will only be considered for Int and Short datatypes.
> >
> > 3. For measures, if values beyond the datatype range are allowed the
> > compression will decrease.
> >
> > Please comment as what should be the parsing behavior. Carbon should
> adopt
> > a strict check mechanism or lenient check mechanism considering that the
> > behavior should be same for both dimensions and measures as both are
> > finally table columns.
> >
> > Regards
> > Manish Gupta
>
>
>
>
>
> --
> View this message in context:
http://apache-carbondata-> mailing-list-archive.1130556.n5.nabble.com/Discussion-
> Parsing-values-during-data-load-should-adopt-a-strict-
> check-or-lenient-check-mechanism-tp3826p3893.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>