Posted by
aaron on
Sep 26, 2018; 3:47pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Serious-Issue-Query-get-inconsistent-result-on-carbon1-5-0-tp63691.html
Hi Community,
I load records by date in batch with save mode append, but the same query get different result when I hit query. And I make sure I did not update data and the first time query result is right.
My env is carbondata 1.5.0 & spark2.3.
1. How I create table and datamap
def store_ddl(carbon: SparkSession, logger: SLOGGER, parallelism: String): Unit = {
val createStoreTableSql = s"""
| CREATE TABLE IF NOT EXISTS store(
| market_code STRING,
| device_code STRING,
| country_code STRING,
| category_id INTEGER,
| product_id LONG,
| date TIMESTAMP,
| est_free_app_download LONG,
| est_paid_app_download LONG,
| est_revenue LONG
| )
| STORED BY 'carbondata'
| TBLPROPERTIES(
| 'SORT_COLUMNS'='market_code, device_code, country_code, category_id, date, product_id',
| 'NO_INVERTED_INDEX'='est_free_app_download, est_paid_app_download, est_revenue',
| 'LOCAL_DICTIONARY_ENABLE'='true',
| 'LOCAL_DICTIONARY_THRESHOLD'='1000',
| 'LOCAL_DICTIONARY_INCLUDE'='market_code, device_code, country_code',
| 'SORT_SCOPE'='LOCAL_SORT',
| 'CACHE_LEVEL'='BLOCKLET',
| 'TABLE_BLOCKSIZE'='256',
| 'GLOBAL_SORT_PARTITIONS'='${parallelism}'
| )
""".stripMargin
carbon.sql(createStoreTableSql)
val createTimeSeriesDayNoProductTableSql = s"""
| CREATE DATAMAP IF NOT EXISTS agg_by_day ON TABLE store
| USING 'timeSeries'
| DMPROPERTIES (
| 'EVENT_TIME'='date',
| 'DAY_GRANULARITY'='1')
| AS SELECT date, market_code, device_code, country_code, category_id,
| COUNT(product_id), COUNT(est_free_app_download), COUNT(est_free_app_download), COUNT(est_revenue),
| SUM(est_free_app_download), MIN(est_free_app_download), MAX(est_free_app_download),
| SUM(est_paid_app_download), MIN(est_paid_app_download), MAX(est_paid_app_download),
| SUM(est_revenue), MIN(est_revenue), MAX(est_revenue)
| FROM store
| GROUP BY date, market_code, device_code, country_code, category_id
""".stripMargin
carbon.sql(createTimeSeriesDayNoProductTableSql)
val createTimeSeriesMonthNoProductTableSql = s"""
| CREATE DATAMAP IF NOT EXISTS agg_by_month ON TABLE store
| USING 'timeSeries'
| DMPROPERTIES (
| 'EVENT_TIME'='date',
| 'MONTH_GRANULARITY'='1')
| AS SELECT date, market_code, device_code, country_code, category_id,
| COUNT(product_id), COUNT(est_free_app_download), COUNT(est_free_app_download), COUNT(est_revenue),
| SUM(est_free_app_download), MIN(est_free_app_download), MAX(est_free_app_download),
| SUM(est_paid_app_download), MIN(est_paid_app_download), MAX(est_paid_app_download),
| SUM(est_revenue), MIN(est_revenue), MAX(est_revenue)
| FROM store
| GROUP BY date, market_code, device_code, country_code, category_id
""".stripMargin
carbon.sql(createTimeSeriesMonthNoProductTableSql)
val createTimeSeriesYearNoProductTableSql = s"""
| CREATE DATAMAP IF NOT EXISTS agg_by_year ON TABLE store
| USING 'timeSeries'
| DMPROPERTIES (
| 'EVENT_TIME'='date',
| 'YEAR_GRANULARITY'='1')
| AS SELECT date, market_code, device_code, country_code, category_id,
| COUNT(product_id), COUNT(est_free_app_download), COUNT(est_free_app_download), COUNT(est_revenue),
| SUM(est_free_app_download), MIN(est_free_app_download), MAX(est_free_app_download),
| SUM(est_paid_app_download), MIN(est_paid_app_download), MAX(est_paid_app_download),
| SUM(est_revenue), MIN(est_revenue), MAX(est_revenue)
| FROM store
| GROUP BY date, market_code, device_code, country_code, category_id
""".stripMargin
carbon.sql(createTimeSeriesYearNoProductTableSql)
}
2. Query and result
carbon.time(carbon.sql(
s"""
|EXPLAIN SELECT date, market_code, device_code, country_code,
category_id, product_id, est_free_app_download, est_paid_app_download,
est_revenue
|FROM store
|WHERE date = '2016-09-01' AND device_code='ios-phone' AND
country_code='EE' AND product_id IN (590416158, 590437560)"""
.stripMargin).show(truncate=false)
)
2.1) First time query
+-------------------+-----------+-----------+------------+-----------+----------+---------------------+---------------------+-----------+
|date |market_code|device_code|country_code|category_id|product_id|est_free_app_download|est_paid_app_download|est_revenue|
+-------------------+-----------+-----------+------------+-----------+----------+---------------------+---------------------+-----------+
|2016-09-01 00:00:00|apple-store|ios-phone |EE |100000 |590416158 |1 |null |null |
|2016-09-01 00:00:00|apple-store|ios-phone |EE |100021 |590416158 |1 |null |null |
|2016-09-01 00:00:00|apple-store|ios-phone |EE |100021 |590437560 |null |1 |null |
|2016-09-01 00:00:00|apple-store|ios-phone |EE |100070 |590437560 |null |1 |null |
|2016-09-01 00:00:00|apple-store|ios-phone |EE |100076 |590416158 |1 |null |null |
|2016-09-01 00:00:00|apple-store|ios-phone |EE |100000 |590437560 |null |1 |null |
+-------------------+-----------+-----------+------------+-----------+----------+---------------------+---------------------+-----------+
2.2) Second time query
+-------------------+-----------+-----------+------------+-----------+----------+---------------------+---------------------+-----------+
|date |market_code|device_code|country_code|category_id|product_id|est_free_app_download|est_paid_app_download|est_revenue|
+-------------------+-----------+-----------+------------+-----------+----------+---------------------+---------------------+-----------+
|2016-09-01 00:00:00|apple-store|ios-phone |EE |100000 |590416158 |null |1 |null |
|2016-09-01 00:00:00|apple-store|ios-phone |EE |100023 |590416158 |null |1 |null |
|2016-09-01 00:00:00|apple-store|ios-phone |EE |100023 |590437560 |null |1 |null |
|2016-09-01 00:00:00|apple-store|ios-phone |EE |100023 |590437560 |null |1 |null |
|2016-09-01 00:00:00|apple-store|ios-phone |EE |100000 |590437560 |null |1 |null |
+-------------------+-----------+-----------+------------+-----------+----------+---------------------+---------------------+-----------+
category 100021, 100070, 100076 missed and 100023 should not be here because 2016-09-01 does not have this record in source data.
And I have execute tons of queries, but they can not return right result as first time query. I doubt that it has something with compaction and local dictionary.
Could anyone give any suggestions about this?
Thanks
Aaron
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/