Posted by
aaron on
Oct 01, 2018; 3:53pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Issue-Query-plan-issue-tp64371p64398.html
I think the query plan info is not right,
1. Total blocklet from carbondata cli is 233 + 86 = 319
2. But query plan tell me that I have 560 blocklet
I hope below info could help you to locate issue.
***********************************************************************************
I use carbondata cli print the blocklet summary like below:
java -cp "/home/hadoop/carbontool/*:/opt/spark/jars/*"
org.apache.carbondata.tool.CarbonCli -cmd summary -a -p
hdfs://ec2-dca-aa-p-sdn-16.appannie.org:9000/usr/carbon/data/default/storev3/
## Summary
total: 80 blocks, 9 shards, 233 blocklets, 62,698 pages, 2,006,205,228 rows,
12.40GB
avg: 158.72MB/block, 54.50MB/blocklet, 25,077,565 rows/block, 8,610,322
rows/blocklet
java -cp "/home/hadoop/carbontool/*:/opt/spark/jars/*"
org.apache.carbondata.tool.CarbonCli -cmd summary -a -p
hdfs://ec2-dca-aa-p-sdn-16.appannie.org:9000/usr/carbon/data/default/usage_basickpi/
## Summary
total: 30 blocks, 14 shards, 86 blocklets, 3,498 pages, 111,719,467 rows,
4.24GB
avg: 144.57MB/block, 50.43MB/blocklet, 3,723,982 rows/block, 1,299,063
rows/blocklet
************************************************************************************
But at the same time, I run a sql, carbon told me below info:
|== CarbonData Profiler ==
Table Scan on storev3
- total: 194 blocks, 560 blocklets
- filter: (((((((((granularity <> null and date <> null) and date >=
1472688000000000 between date <= 1475280000000000) and true) and granularity
= monthly) and country_code in
(LiteralExpression(US);LiteralExpression(CN);LiteralExpression(JP);)) and
device_code in (LiteralExpression(ios-phone);)) and product_id <> null) and
country_code <> null) and device_code <> null)
- pruned by Main DataMap
- skipped: 192 blocks, 537 blocklets
************************************************************************************
The select sql like is
SELECT f.country_code, f.date, f.product_id, f.category_id, f.arpu FROM (
SELECT a.country_code, a.date, a.product_id, a.category_id,
a.revenue/a.average_active_users as arpu
FROM(
SELECT r.device_code, r.category_id, r.country_code, r.date,
r.product_id, r.revenue, u.average_active_users
FROM
(
SELECT b.device_code, b.country_code, b.product_id, b.date,
b.category_id, sum(b.revenue) as revenue
FROM (
SELECT v.device_code, v.country_code, v.product_id,
v.revenue, v.date, p.category_id FROM
(
SELECT device_code, country_code, product_id,
est_revenue as revenue, timeseries(date, 'month') as date
FROM storev3
WHERE market_code='apple-store' AND date BETWEEN
'2016-09-01' AND '2016-10-01' and device_code in ('ios-phone') and
country_code in ('US', 'CN', 'JP')
) as v
JOIN(
SELECT DISTINCT product_id, category_id
FROM storev3
WHERE market_code='apple-store' AND date BETWEEN
'2016-09-01' AND '2016-10-01' and device_code in ('ios-phone') and
category_id in (100000, 100001, 100021) and country_code in ('US', 'CN',
'JP')
) as p
ON p.product_id = v.product_id
) as b
GROUP BY b.device_code, b.country_code, b.product_id, b.date,
b.category_id
) AS r
JOIN
(
SELECT country_code, date, product_id, (CASE WHEN
est_average_active_users is not NULL THEN est_average_active_users ELSE 0
END) as average_active_users, device_code
FROM usage_basickpi
WHERE date BETWEEN '2016-09-01' AND '2016-10-01'and granularity
='monthly' and country_code in ('US', 'CN', 'JP') AND device_code in
('ios-phone')
) AS u
ON r.country_code=u.country_code AND r.date=u.date AND
r.product_id=u.product_id AND r.device_code=u.device_code
) AS a
)AS f
ORDER BY f.arpu DESC
LIMIT 10
Thanks
Aaron
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/