Login  Register

[Serious Issue] Query get inconsistent result on carbon1.5.0

Posted by aaron on Sep 26, 2018; 3:47pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Serious-Issue-Query-get-inconsistent-result-on-carbon1-5-0-tp63691.html

Hi Community,

I load records by date in batch with save mode append, but the same query get different result when I hit query. And I make sure I did not update data and the first time query result is right.

My env is carbondata 1.5.0 & spark2.3.

1. How I create table and datamap
def store_ddl(carbon: SparkSession, logger: SLOGGER, parallelism: String): Unit = {
    val createStoreTableSql = s"""
                                 | CREATE TABLE IF NOT EXISTS store(
                                 | market_code STRING,
                                 | device_code STRING,
                                 | country_code STRING,
                                 | category_id INTEGER,
                                 | product_id LONG,
                                 | date TIMESTAMP,
                                 | est_free_app_download LONG,
                                 | est_paid_app_download LONG,
                                 | est_revenue LONG
                                 | )
                                 | STORED BY 'carbondata'
                                 | TBLPROPERTIES(
                                 | 'SORT_COLUMNS'='market_code, device_code, country_code, category_id, date, product_id',
                                 | 'NO_INVERTED_INDEX'='est_free_app_download, est_paid_app_download, est_revenue',
                                 | 'LOCAL_DICTIONARY_ENABLE'='true',
                                 | 'LOCAL_DICTIONARY_THRESHOLD'='1000',
                                 | 'LOCAL_DICTIONARY_INCLUDE'='market_code, device_code, country_code',
                                 | 'SORT_SCOPE'='LOCAL_SORT',
                                 | 'CACHE_LEVEL'='BLOCKLET',
                                 | 'TABLE_BLOCKSIZE'='256',
                                 | 'GLOBAL_SORT_PARTITIONS'='${parallelism}'
                                 | )
      """.stripMargin
    carbon.sql(createStoreTableSql)

    val createTimeSeriesDayNoProductTableSql = s"""
                                                    | CREATE DATAMAP IF NOT EXISTS agg_by_day ON TABLE store
                                                    | USING 'timeSeries'
                                                    | DMPROPERTIES (
                                                    | 'EVENT_TIME'='date',
                                                    | 'DAY_GRANULARITY'='1')
                                                    | AS SELECT date, market_code, device_code, country_code, category_id,
                                                    | COUNT(product_id), COUNT(est_free_app_download), COUNT(est_free_app_download), COUNT(est_revenue),
                                                    | SUM(est_free_app_download), MIN(est_free_app_download), MAX(est_free_app_download),
                                                    | SUM(est_paid_app_download), MIN(est_paid_app_download), MAX(est_paid_app_download),
                                                    | SUM(est_revenue), MIN(est_revenue), MAX(est_revenue)
                                                    | FROM store
                                                    | GROUP BY date, market_code, device_code, country_code, category_id
      """.stripMargin
    carbon.sql(createTimeSeriesDayNoProductTableSql)

    val createTimeSeriesMonthNoProductTableSql = s"""
                                                    | CREATE DATAMAP IF NOT EXISTS agg_by_month ON TABLE store
                                                    | USING 'timeSeries'
                                                    | DMPROPERTIES (
                                                    | 'EVENT_TIME'='date',
                                                    | 'MONTH_GRANULARITY'='1')
                                                    | AS SELECT date, market_code, device_code, country_code, category_id,
                                                    | COUNT(product_id), COUNT(est_free_app_download), COUNT(est_free_app_download), COUNT(est_revenue),
                                                    | SUM(est_free_app_download), MIN(est_free_app_download), MAX(est_free_app_download),
                                                    | SUM(est_paid_app_download), MIN(est_paid_app_download), MAX(est_paid_app_download),
                                                    | SUM(est_revenue), MIN(est_revenue), MAX(est_revenue)
                                                    | FROM store
                                                    | GROUP BY date, market_code, device_code, country_code, category_id
      """.stripMargin
    carbon.sql(createTimeSeriesMonthNoProductTableSql)

    val createTimeSeriesYearNoProductTableSql = s"""
                                                   | CREATE DATAMAP IF NOT EXISTS agg_by_year ON TABLE store
                                                   | USING 'timeSeries'
                                                   | DMPROPERTIES (
                                                   | 'EVENT_TIME'='date',
                                                   | 'YEAR_GRANULARITY'='1')
                                                   | AS SELECT date, market_code, device_code, country_code, category_id,
                                                   | COUNT(product_id), COUNT(est_free_app_download), COUNT(est_free_app_download), COUNT(est_revenue),
                                                   | SUM(est_free_app_download), MIN(est_free_app_download), MAX(est_free_app_download),
                                                   | SUM(est_paid_app_download), MIN(est_paid_app_download), MAX(est_paid_app_download),
                                                   | SUM(est_revenue), MIN(est_revenue), MAX(est_revenue)
                                                   | FROM store
                                                   | GROUP BY date, market_code, device_code, country_code, category_id
      """.stripMargin
    carbon.sql(createTimeSeriesYearNoProductTableSql)
  }

2. Query and result
carbon.time(carbon.sql(
      s"""
         |EXPLAIN SELECT date, market_code, device_code, country_code,
category_id, product_id, est_free_app_download, est_paid_app_download,
est_revenue
         |FROM store
         |WHERE date = '2016-09-01' AND device_code='ios-phone' AND
country_code='EE' AND product_id IN (590416158, 590437560)"""
        .stripMargin).show(truncate=false)
    )

2.1) First time query
+-------------------+-----------+-----------+------------+-----------+----------+---------------------+---------------------+-----------+
|date               |market_code|device_code|country_code|category_id|product_id|est_free_app_download|est_paid_app_download|est_revenue|
+-------------------+-----------+-----------+------------+-----------+----------+---------------------+---------------------+-----------+
|2016-09-01 00:00:00|apple-store|ios-phone  |EE          |100000     |590416158 |1                    |null                 |null       |
|2016-09-01 00:00:00|apple-store|ios-phone  |EE          |100021     |590416158 |1                    |null                 |null       |
|2016-09-01 00:00:00|apple-store|ios-phone  |EE          |100021     |590437560 |null                 |1                    |null       |
|2016-09-01 00:00:00|apple-store|ios-phone  |EE          |100070     |590437560 |null                 |1                    |null       |
|2016-09-01 00:00:00|apple-store|ios-phone  |EE          |100076     |590416158 |1                    |null                 |null       |
|2016-09-01 00:00:00|apple-store|ios-phone  |EE          |100000     |590437560 |null                 |1                    |null       |
+-------------------+-----------+-----------+------------+-----------+----------+---------------------+---------------------+-----------+


2.2) Second time query
+-------------------+-----------+-----------+------------+-----------+----------+---------------------+---------------------+-----------+
|date               |market_code|device_code|country_code|category_id|product_id|est_free_app_download|est_paid_app_download|est_revenue|
+-------------------+-----------+-----------+------------+-----------+----------+---------------------+---------------------+-----------+
|2016-09-01 00:00:00|apple-store|ios-phone  |EE          |100000     |590416158 |null                 |1                    |null       |
|2016-09-01 00:00:00|apple-store|ios-phone  |EE          |100023     |590416158 |null                 |1                    |null       |
|2016-09-01 00:00:00|apple-store|ios-phone  |EE          |100023     |590437560 |null                 |1                    |null       |
|2016-09-01 00:00:00|apple-store|ios-phone  |EE          |100023     |590437560 |null                 |1                    |null       |
|2016-09-01 00:00:00|apple-store|ios-phone  |EE          |100000     |590437560 |null                 |1                    |null       |
+-------------------+-----------+-----------+------------+-----------+----------+---------------------+---------------------+-----------+
category 100021, 100070, 100076 missed and 100023 should not be here because 2016-09-01 does not have this record in source data.

And I have execute tons of queries, but they can not return right result as first time query. I doubt that it has something with compaction and local dictionary.

Could anyone give any suggestions about this?


Thanks
Aaron

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/