GitHub user QiangCai opened a pull request:
https://github.com/apache/carbondata/pull/1440 [WIP][CARBONDATA-1581][CARBONDATA-1582] Implement StreamSinkProvider and stream file writer
1. Change hadoop.version to 2.7.2 as default
Require using truncate operation of the filesystem.
2. CarbonSource extend StreamSinkProvider
Provide stream sink to support streaming ingest
3. Implement CarbonStreamOutputFormat and CarbonStreamRecordWriter
CarbonStreamRecordWriter write input data to CarbonData stream file.
4. Avoid Small file issue
Append new blocklet to old file to avoid small file issue
5. Support fault tolerant
Stream segment has a CarbonIndex file, this index file record the information of the CarbonData files.
We can recover data to last successful commit.
You can merge this pull request into a Git repository by running:
$ git pull
https://github.com/QiangCai/carbondata streaming
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/1440.patchTo close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1440
----
commit 6c94c9311ea1b260e75bf576eec75aea17ce8984
Author: QiangCai <
[hidden email]>
Date: 2017-10-18T03:13:00Z
support streaming ingest
----
---