Login  Register

[DISCUSSION] Support DataLoad using Json for CarbonSession

Posted by Indhumathi on Dec 05, 2018; 10:24am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Support-DataLoad-using-Json-for-CarbonSession-tp69810.html

Hello All,

I am working on supporting data load using JSON file for CarbonSession.

1. Json File Loading will use JsonInputFormat.The JsonInputFormat will read
two types of JSON formatted data.
i).The default expectation is each JSON record is newline delimited. This
method is generally faster and is backed by the LineRecordReader you are
likely familiar with.
This will use SimpleJsonRecordReader to read a line of JSON and return it as
a Text object.
ii).The other method is 'pretty print' of JSON records, where records span
multiple lines and often have some type of root identifier.
This method is likely slower, but respects record boundaries much like the
LineRecordReader. User has to provide the identifier and set
"json.input.format.record.identifier".
This will use JsonRecordReader to read JSON records from a file. It respects
split boundaries to complete full JSON records, as specified by the root
identifier.
JsonStreamReader handles byte-by-byte reading of a JSON stream, creating
records based on a base 'identifier'.

2. Implement JsonRecordReaderIterator similar to CSVRecordReaderIterator

3. Use JsonRowParser which will convert jsonToCarbonRecord and generate a
Carbon Row.

Please feel free to provide your comments and suggestions.I am working on
the design document and will upload soon in JIRA below.
https://issues.apache.org/jira/browse/CARBONDATA-3146 

Regards,
Indhumathi M









--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/