Login  Register

[DISCUSSION] CarbonData Integration with Presto

Posted by bhavya411 on Jun 29, 2017; 3:36pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-CarbonData-Integration-with-Presto-tp16793.html


We are using  presto in our company for querying the data , the query processing is slower than what is the expectation of the client , we have been looking at the Apache CarbonData as a storage format so to make a decision we ran TPCH benchmarking on presto with CarbonData and compared it with Parquet. CarbonData community ask us to share the test result to mailing list for discussion,  so please see below for the details of the benchmarking that we did.

Question : We were able to run only 16 query set out of 22 query set as 2 query set involved temporary tables and 4 query set involved Views which are also not supported by Carbondata. Is there a way to run all the queries.

Environment
Cluster : 3 Node Cluster (48 GB RAM , 8 CPU Core and 2 TB hard-disk each) 

Data
Data Set : 50 GB data was generated using TPCH 2.17.2 (Schema is attached)

Results  :  Results are attached in the xlsx with correcponding queries.

The Carbon Data is performing better than Parquet , but since I have to make a decision on it we are trying to do benchmarking on more data and also compare it with ORC.


Inline image 1


Presto BenchMarking.xlsx (14K) Download Attachment