Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Commented] (CARBONDATA-2085) It's different between load twice and create datamap with load again after load data and create datamap

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Commented] (CARBONDATA-2085) It's different between load twice and create datamap with load again after load data and create datamap

[ https://issues.apache.org/jira/browse/CARBONDATA-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343347#comment-16343347 ]

kumar vishal commented on CARBONDATA-2085:
------------------------------------------

[~xubo245]

+*First Case:*+

*Create table,*

*Load data*

*Create data Map*

*Load data*

*In this case it will have two segments in data map so it will return 2 rows ,*

*Second case*

*Create table,*

*Load data*

*Load data*

*Create data Map*

*In this case it will have 1 segments as out of 2 maintable segments data will be aggregated and only 1 segments will be created for data map so it will have 1 row as complete data is aggregate*

*Note: While creating a data map if maintable data is already loaded then it will create only one segments and complete aggregated data will be present in one segment*

*When data is loaded after creating data map then new segment will be created for data map, that segment will contain the data of only that load*

*To Validate the Result of data map whether its correct or not please run the query on maintable*

> It's different between load twice and create datamap with load again after load data and create datamap
> -------------------------------------------------------------------------------------------------------
>
> Key: CARBONDATA-2085
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2085
> Project: CarbonData
> Issue Type: Bug
> Components: core, spark-integration
> Affects Versions: 1.3.0
> Reporter: xubo245
> Priority: Major
> Fix For: 1.3.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> It's different between two test case
> test case 1: load twice and create datamap , and then query
> test case 2:load once , create datamap and load again, and then query
> {code:java}
> + test("load data into mainTable after create timeseries datamap on table 1") {
> + sql("drop table if exists mainTable")
> + sql(
> + """
> + | CREATE TABLE mainTable(
> + | mytime timestamp,
> + | name string,
> + | age int)
> + | STORED BY 'org.apache.carbondata.format'
> + """.stripMargin)
> +
> + sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into table mainTable")
> +
> + sql(
> + """
> + | create datamap agg0 on table mainTable
> + | using 'preaggregate'
> + | DMPROPERTIES (
> + | 'timeseries.eventTime'='mytime',
> + | 'timeseries.hierarchy'='second=1,minute=1,hour=1,day=1,month=1,year=1')
> + | as select mytime, sum(age)
> + | from mainTable
> + | group by mytime""".stripMargin)
> +
> + sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into table mainTable")
> + val df = sql(
> + """
> + | select
> + | timeseries(mytime,'minute') as minuteLevel,
> + | sum(age) as sum
> + | from mainTable
> + | where timeseries(mytime,'minute')>='2016-02-23 01:01:00'
> + | group by
> + | timeseries(mytime,'minute')
> + | order by
> + | timeseries(mytime,'minute')
> + """.stripMargin)
> +
> + // only for test, it need remove before merge
> + df.show()
> + sql("select * from maintable_agg0_minute").show(100)
> +
> + checkAnswer(df,
> + Seq(Row(Timestamp.valueOf("2016-02-23 01:01:00"), 120),
> + Row(Timestamp.valueOf("2016-02-23 01:02:00"), 280)))
> +
> + }
> +
> + test("load data into mainTable after create timeseries datamap on table 2") {
> + sql("drop table if exists mainTable")
> + sql(
> + """
> + | CREATE TABLE mainTable(
> + | mytime timestamp,
> + | name string,
> + | age int)
> + | STORED BY 'org.apache.carbondata.format'
> + """.stripMargin)
> +
> + sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into table mainTable")
> + sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into table mainTable")
> + sql(
> + """
> + | create datamap agg0 on table mainTable
> + | using 'preaggregate'
> + | DMPROPERTIES (
> + | 'timeseries.eventTime'='mytime',
> + | 'timeseries.hierarchy'='second=1,minute=1,hour=1,day=1,month=1,year=1')
> + | as select mytime, sum(age)
> + | from mainTable
> + | group by mytime""".stripMargin)
> +
> +
> + val df = sql(
> + """
> + | select
> + | timeseries(mytime,'minute') as minuteLevel,
> + | sum(age) as sum
> + | from mainTable
> + | where timeseries(mytime,'minute')>='2016-02-23 01:01:00'
> + | group by
> + | timeseries(mytime,'minute')
> + | order by
> + | timeseries(mytime,'minute')
> + """.stripMargin)
> +
> + // only for test, it need remove before merge
> + df.show()
> + sql("select * from maintable_agg0_minute").show(100)
> +
> +
> + checkAnswer(df,
> + Seq(Row(Timestamp.valueOf("2016-02-23 01:01:00"), 120),
> + Row(Timestamp.valueOf("2016-02-23 01:02:00"), 280)))
> + }
> +
> {code}
> test case 1 and 2 should success , but test case 1 fail

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)