Posted by
simafengyun on
Sep 25, 2017; 5:56am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-optimization-of-OrderBy-sorted-columns-Limit-Query-tp23057p23073.html
Recently , I used the latest code done test as below
1. Create Table:
CREATE TABLE rx5_tbox_parquet_all(
carid STRING,
inputstime TIMESTAMP,
carsyspwrmod INT,
cardofrontpas INT,
cardofrontdrv INT,
cardorearleft INT,
cardorearright INT,
carbonnet INT,
carboot INT,
carwinfrontleft INT,
carwinrearleft INT,
carwinfrontright INT,
carwinrearright INT,
carsunroof INT,
carcsactive INT,
carcsenabled INT,
carseatbeltdrv INT
)
STORED BY 'carbondata'
TBLPROPERTIES('SORT_COLUMNS'='carid',
'DICTIONARY_INCLUDE'='carid')
2. Load 0.1 billion data
3. Run the below sql
select
carid,
inputstime,
carsyspwrmod,
cardofrontpas,
cardofrontdrv,
cardorearleft,
cardorearright,
carbonnet,
carboot,
carwinfrontleft,
carwinrearleft,
carwinfrontright,
carwinrearright
from rx5_tbox_parquet_all2
order by carid
limit 10
Use carbondata1.2 master code + spark2.1 to run
|carid
|inputstime|carsyspwrmod|cardofrontpas|cardofrontdrv|cardorearleft|cardorearright|carbonnet|carboot|carwinfrontleft|carwinrearleft|carwinfrontright|carwinrearright|
+-----------------+--------+------------+---------------+---------------+---------------+----------------+---------+-------+------------------+-----------------+-------------------+------------------+
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
+-----------------+--------+------------+---------------+---------------+---------------+----------------+---------+-------+------------------+-----------------+-------------------+------------------+
limit 10 query time: 28777 milliseconds
Use orderby +limit optimized carbondata1.2 master code + spark1.6.3 to run
+-----------------+--------+------------+---------------+---------------+---------------+----------------+---------+-------+------------------+-----------------+-------------------+------------------+
|carid
|inputstime|carsyspwrmod|cardofrontpas|cardofrontdrv|cardorearleft|cardorearright|carbonnet|carboot|carwinfrontleft|carwinrearleft|carwinfrontright|carwinrearright|
+-----------------+--------+------------+---------------+---------------+---------------+----------------+---------+-------+------------------+-----------------+-------------------+------------------+
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
|LSJA24790HS020662|null |2 |0 |0 |0
|0 |0 |0 |0 |0
|0 |0 |
+-----------------+--------+------------+---------------+---------------+---------------+----------------+---------+-------+------------------+-----------------+-------------------+------------------+
limit 10 query time: 1640 milliseconds
Apparently, after optimization, even I use spark1.6.3, it also improved 90%
performance
Thanks
马云
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/