Apache CarbonData Dev Mailing List archive - Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Apache CarbonData Dev Mailing List archive

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Posted by Ajantha Bhat on Sep 30, 2019; 6:56am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Support-Time-Series-for-MV-datamap-and-autodatamap-loading-of-timeseries-datamaps-tp84721p84927.html

+ 1 ,

I have some suggestions and questions.

1. In DMPROPERTIES, instead of 'timestamp_column' suggest to use
'timeseries_column'.
so that it won't give an impression that only time stamp datatype is
supported and update the document with all the datatype supported.

2. Querying on this datamap table is also supported right ? supporting
changing plan for main table to refer datamap table is for user to avoid
changing his query or any other reason ?

3. If user has not created day granularity datamap, but just created hour
granularity datamap. When query has day granularity, data will be fetched
form hour granularity datamap and aggregated ? or data is fetched from main
table ?

Thanks,
Ajantha

On Mon, Sep 30, 2019 at 11:46 AM Akash Nilugal <[hidden email]>
wrote:

> Hi xuchuanyin,
>
> Thanks for the comments/Suggestions
>
> 1. Preaggregate is productized, but not the timeseries with preaggregate,
> i think you got confused with that, if im right.
> 2. Limitations like, auto sampling or rollup, which we will be supporting
> now. Retention policies. etc
> 3. segmentTimestampMin, this i will consider in design.
> 4. RP is added as a separate task, i thought instead of maintaining two
> variables better to maintabin one and parse it. But i will consider your
> point based on feasibility during implementation.
> 5. We use an accumulator which takes list, so before writing index files
> we take the min max of the timestamp column and fill in accumulator and
> then we can access accumulator.value in driver after load is finished.
>
> Regards,
> Akash R Nilugal
>
> On 2019/09/28 10:46:31, xuchuanyin <[hidden email]> wrote:
> > Hi akash, glad to see the feature proposed and I have some questions
> about
> > this. Please notice that some of the following descriptions are comments
> > followed by '===' described in the design document attached in the
> > corresponding jira.
> >
> > 1.
> > "Currently carbondata supports timeseries on preaggregate datamap, but
> its
> > an alpha feature"
> > ===
> > It has been some time since the preaggregate datamap was introduced and
> it
> > is still **alpha**, why it is still not product-ready? Will the new
> feature
> > also come into the similar situation?
> >
> > 2.
> > "there are so many limitations when we compare and analyze the existing
> > timeseries database or projects which supports time series like apache
> druid
> > or influxdb"
> > ===
> > What are the actual limitations? Besides, please give an example of this.
> >
> > 3.
> > "Segment_Timestamp_Min"
> > ===
> > Suggest using camel-case style like 'segmentTimestampMin'
> >
> > 4.
> > "RP is way of telling the system, for how long the data should be kept"
> > ===
> > Since the function is simple, I'd suggest using 'retentionTime'=15 and
> > 'timeUnit'='day' instead of 'RP'='15_days'
> >
> > 5.
> > "When the data load is called for main table, use an spark accumulator to
> > get the maximum value of timestamp in that load and return to the load."
> > ===
> > How can you get the spark accumulator? The load is launched using
> > loading-by-dataframe not using global-sort-by-spark.
> >
> > 6.
> > For the rest of the content, still reading.
> >
> >
> >
> >
> > --
> > Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>