[ https://issues.apache.org/jira/browse/CARBONDATA-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343347#comment-16343347 ] kumar vishal commented on CARBONDATA-2085: ------------------------------------------ [~xubo245] +*First Case:*+ *Create table,* *Load data* *Create data Map* *Load data* *In this case it will have two segments in data map so it will return 2 rows ,* *Second case* *Create table,* *Load data* *Load data* *Create data Map* *In this case it will have 1 segments as out of 2 maintable segments data will be aggregated and only 1 segments will be created for data map so it will have 1 row as complete data is aggregate* *Note: While creating a data map if maintable data is already loaded then it will create only one segments and complete aggregated data will be present in one segment* *When data is loaded after creating data map then new segment will be created for data map, that segment will contain the data of only that load* *To Validate the Result of data map whether its correct or not please run the query on maintable* > It's different between load twice and create datamap with load again after load data and create datamap > ------------------------------------------------------------------------------------------------------- > > Key: CARBONDATA-2085 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2085 > Project: CarbonData > Issue Type: Bug > Components: core, spark-integration > Affects Versions: 1.3.0 > Reporter: xubo245 > Priority: Major > Fix For: 1.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > It's different between two test case > test case 1: load twice and create datamap , and then query > test case 2:load once , create datamap and load again, and then query > {code:java} > + test("load data into mainTable after create timeseries datamap on table 1") { > + sql("drop table if exists mainTable") > + sql( > + """ > + | CREATE TABLE mainTable( > + | mytime timestamp, > + | name string, > + | age int) > + | STORED BY 'org.apache.carbondata.format' > + """.stripMargin) > + > + sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into table mainTable") > + > + sql( > + """ > + | create datamap agg0 on table mainTable > + | using 'preaggregate' > + | DMPROPERTIES ( > + | 'timeseries.eventTime'='mytime', > + | 'timeseries.hierarchy'='second=1,minute=1,hour=1,day=1,month=1,year=1') > + | as select mytime, sum(age) > + | from mainTable > + | group by mytime""".stripMargin) > + > + sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into table mainTable") > + val df = sql( > + """ > + | select > + | timeseries(mytime,'minute') as minuteLevel, > + | sum(age) as sum > + | from mainTable > + | where timeseries(mytime,'minute')>='2016-02-23 01:01:00' > + | group by > + | timeseries(mytime,'minute') > + | order by > + | timeseries(mytime,'minute') > + """.stripMargin) > + > + // only for test, it need remove before merge > + df.show() > + sql("select * from maintable_agg0_minute").show(100) > + > + checkAnswer(df, > + Seq(Row(Timestamp.valueOf("2016-02-23 01:01:00"), 120), > + Row(Timestamp.valueOf("2016-02-23 01:02:00"), 280))) > + > + } > + > + test("load data into mainTable after create timeseries datamap on table 2") { > + sql("drop table if exists mainTable") > + sql( > + """ > + | CREATE TABLE mainTable( > + | mytime timestamp, > + | name string, > + | age int) > + | STORED BY 'org.apache.carbondata.format' > + """.stripMargin) > + > + sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into table mainTable") > + sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into table mainTable") > + sql( > + """ > + | create datamap agg0 on table mainTable > + | using 'preaggregate' > + | DMPROPERTIES ( > + | 'timeseries.eventTime'='mytime', > + | 'timeseries.hierarchy'='second=1,minute=1,hour=1,day=1,month=1,year=1') > + | as select mytime, sum(age) > + | from mainTable > + | group by mytime""".stripMargin) > + > + > + val df = sql( > + """ > + | select > + | timeseries(mytime,'minute') as minuteLevel, > + | sum(age) as sum > + | from mainTable > + | where timeseries(mytime,'minute')>='2016-02-23 01:01:00' > + | group by > + | timeseries(mytime,'minute') > + | order by > + | timeseries(mytime,'minute') > + """.stripMargin) > + > + // only for test, it need remove before merge > + df.show() > + sql("select * from maintable_agg0_minute").show(100) > + > + > + checkAnswer(df, > + Seq(Row(Timestamp.valueOf("2016-02-23 01:01:00"), 120), > + Row(Timestamp.valueOf("2016-02-23 01:02:00"), 280))) > + } > + > {code} > test case 1 and 2 should success , but test case 1 fail -- This message was sent by Atlassian JIRA (v7.6.3#76005) |
Free forum by Nabble | Edit this page |