http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Presto-CarbonData-optimization-work-discussion-tp18509p18526.html
impact with lazy decoding in this scenario. Can you do one more test by
integrations. It will tell whether it is really a lazy decoding issue or
not.
> Hi
>
> For -- 4) Lazy decoding of the dictionary, just i tested 180 millions rows
> data with the script:
> "select province,sum(age),count(*) from presto_carbondata group by province
> order by province"
>
> Spark integration module has "dictionary lazy decode", presto doesn't have
> "dictionary lazy decode", the performance is 4.5 times difference, so
> "dictionary lazy decode" might much help to improve aggregation
> performance.
>
> The detail test result as below :
>
> *1. Presto+CarbonData is 9 second:*
> presto:default> select province,sum(age),count(*) from presto_carbondata
> group by province order by province;
> province | _col1 | _col2
> ----------+----------+---------
> AB | 57442740 | 1385010
> BC | 57488826 | 1385580
> MB | 57564702 | 1386510
> NB | 57599520 | 1386960
> NL | 57446592 | 1383774
> NS | 57448734 | 1384272
> NT | 57534228 | 1386936
> NU | 57506844 | 1385346
> ON | 57484956 | 1384470
> PE | 57325164 | 1379802
> QC | 57467886 | 1385076
> SK | 57385152 | 1382364
> YT | 57377556 | 1383900
> (13 rows)
>
> Query 20170720_022833_00004_c9ky2, FINISHED, 1 node
> Splits: 55 total, 55 done (100.00%)
> 0:09 [18M rows, 34.3MB] [1.92M rows/s, 3.65MB/s]
>
> *2.Spark+CarbonData is :2 seconds*
> scala> benchmark { carbon.sql("select province,sum(age),count(*) from
> presto_carbondata group by province order by province").show }
> +--------+--------+--------+
> |province|sum(age)|count(1)|
> +--------+--------+--------+
> | AB|57442740| 1385010|
> | BC|57488826| 1385580|
> | MB|57564702| 1386510|
> | NB|57599520| 1386960|
> | NL|57446592| 1383774|
> | NS|57448734| 1384272|
> | NT|57534228| 1386936|
> | NU|57506844| 1385346|
> | ON|57484956| 1384470|
> | PE|57325164| 1379802|
> | QC|57467886| 1385076|
> | SK|57385152| 1382364|
> | YT|57377556| 1383900|
> +--------+--------+--------+
>
> 2109.346231ms
>
>
>
> --
> View this message in context:
http://apache-carbondata-dev-> mailing-list-archive.1130556.n5.nabble.com/Presto-
> CarbonData-optimization-work-discussion-tp18509p18522.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>