Apache CarbonData Dev Mailing List archive - Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Apache CarbonData Dev Mailing List archive

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Posted by akashnilugal@gmail.com on Sep 30, 2019; 6:16am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Support-Time-Series-for-MV-datamap-and-autodatamap-loading-of-timeseries-datamaps-tp84721p84926.html

Hi xuchuanyin,

Thanks for the comments/Suggestions

1. Preaggregate is productized, but not the timeseries with preaggregate, i think you got confused with that, if im right.
2. Limitations like, auto sampling or rollup, which we will be supporting now. Retention policies. etc
3. segmentTimestampMin, this i will consider in design.
4. RP is added as a separate task, i thought instead of maintaining two variables better to maintabin one and parse it. But i will consider your point based on feasibility during implementation.
5. We use an accumulator which takes list, so before writing index files we take the min max of the timestamp column and fill in accumulator and then we can access accumulator.value in driver after load is finished.

Regards,
Akash R Nilugal

On 2019/09/28 10:46:31, xuchuanyin <[hidden email]> wrote:

> Hi akash, glad to see the feature proposed and I have some questions about
> this. Please notice that some of the following descriptions are comments
> followed by '===' described in the design document attached in the
> corresponding jira.
>
> 1.
> "Currently carbondata supports timeseries on preaggregate datamap, but its
> an alpha feature"
> ===
> It has been some time since the preaggregate datamap was introduced and it
> is still **alpha**, why it is still not product-ready? Will the new feature
> also come into the similar situation?
>
> 2.
> "there are so many limitations when we compare and analyze the existing
> timeseries database or projects which supports time series like apache druid
> or influxdb"
> ===
> What are the actual limitations? Besides, please give an example of this.
>
> 3.
> "Segment_Timestamp_Min"
> ===
> Suggest using camel-case style like 'segmentTimestampMin'
>
> 4.
> "RP is way of telling the system, for how long the data should be kept"
> ===
> Since the function is simple, I'd suggest using 'retentionTime'=15 and
> 'timeUnit'='day' instead of 'RP'='15_days'
>
> 5.
> "When the data load is called for main table, use an spark accumulator to
> get the maximum value of timestamp in that load and return to the load."
> ===
> How can you get the spark accumulator? The load is launched using
> loading-by-dataframe not using global-sort-by-spark.
>
> 6.
> For the rest of the content, still reading.
>
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>