Apache CarbonData Dev Mailing List archive

[DISCUSSION] Support write Flink streaming data to Carbon

Classic

List

Threaded

3 messages Options

niuge

[DISCUSSION] Support write Flink streaming data to Carbon

This post was updated on .

The write process is:

1.Write flink streaming data to local file system of flink task node use flink StreamingFileSink and carbon SDK;
2.Copy local carbon data file to carbon data store system, such as HDFS, S3;
3.Generate and write segment file to ${tablePath}/load_details;

Run "alter table ${tableName} collect segments" command on server, to compact segment files in ${tablePath}/load_details, and then move the compacted segment file to ${tablePath}/Metadata/Segments/，update table status file finally.

Have raised a jira https://issues.apache.org/jira/browse/CARBONDATA-3557 and attached design document to it. Request you to please have a look.

Welcome you opinion and suggestions.

Jacky Li-3

Re: [DISCUSSION] Support write Flink streaming data to Carbon

+1 for these feature, in my opinion, flink-carbon is a good fit for near realtiem analytics

One doubt is that in your design, the Collect Segment command and Compaction command are two separate commands, right?

Collect Segment command modify the metadata files (tablestatus file and segment file), while Compaction command merges small data files and build indexes.

Is my understanding right?

Regards,
Jacky

On 2019/10/29 06:59:51, "爱在西元前" <[hidden email]> wrote:

> The write process is:
>
> Write flink streaming data to local file system of flink task node use flink StreamingFileSink and carbon SDK;
>
> Copy local carbon data file to carbon data store system, such as HDFS, S3;
>
> Generate and write segment file to ${tablePath}/load_details;
>
> Run "alter table ${tableName} collect segments" command on server, to compact segment files in ${tablePath}/load_details, and then move the compacted segment file to ${tablePath}/Metadata/Segments/，update table status file finally.
>
> Have raised a jira https://issues.apache.org/jira/browse/CARBONDATA-3557
>
> Welcome you opinion and suggestions.

sraghunandan

Re: [DISCUSSION] Support write Flink streaming data to Carbon

+1

On Thu, 31 Oct, 2019, 9:13 AM Jacky Li, <[hidden email]> wrote:

> +1 for these feature, in my opinion, flink-carbon is a good fit for near
> realtiem analytics
>
> One doubt is that in your design, the Collect Segment command and
> Compaction command are two separate commands, right?
>
> Collect Segment command modify the metadata files (tablestatus file and
> segment file), while Compaction command merges small data files and build
> indexes.
>
> Is my understanding right?
>
> Regards,
> Jacky
>
> On 2019/10/29 06:59:51, "爱在西元前" <[hidden email]> wrote:
> > The write process is:
> >
> > Write flink streaming data to local file system of flink task node use
> flink StreamingFileSink and carbon SDK;
> >
> > Copy local carbon data file to carbon data store system, such as HDFS,
> S3;
> >
> > Generate and write segment file to ${tablePath}/load_details;
> >
> > Run "alter table ${tableName} collect segments" command on server, to
> compact segment files in ${tablePath}/load_details, and then move the
> compacted segment file to ${tablePath}/Metadata/Segments/，update table
> status file finally.
> >
> > Have raised a jira https://issues.apache.org/jira/browse/CARBONDATA-3557
> >
> > Welcome you opinion and suggestions.
>