Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Created] (CARBONDATA-4085) How to improve query execution time further

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Created] (CARBONDATA-4085) How to improve query execution time further

suyash yadav created CARBONDATA-4085:
----------------------------------------

Summary: How to improve query execution time further
Key: CARBONDATA-4085
URL: https://issues.apache.org/jira/browse/CARBONDATA-4085
Project: CarbonData
Issue Type: Improvement
Components: sql
Affects Versions: 2.0.1
Reporter: suyash yadav
Fix For: 2.0.1

Hi Team,

We are doing a POC where we would like oour query execution to be fatser, mostly in the range of 3 to 4 seconds.

We have read carbon docuements where it has been claimed that carbondata can help to scan PETABYTES of data and present results in 3 to 4 seconds , which does not seem to be the case as per our observation.

Our table size is 1.6 billionand query is fetching only 4K records but still it takes around 22 to 25 seconds for query execution.

Below is our query that we are firing:

==============================

spark.sql("select ts,resource,metric,value from fact_timestamp_global left join tags_10_days_test on fact_timestamp_global.tags_id= tags_10_days_test.id where metric in ('Outbound Utilization (percent)','Inbound Utilization (percent)') and resource='10.212.7.98_if:<0001>' and ts>='2020-09-28 00:00:00' and ts<='2020-09-28 23:55:55'").show(false)

=================================

Definition of fact_timestamp_global is like below:

========================

spark.sql("create table Fact_timestamp_GLOBAL(ts timestamp,metric string,tags_id string,value double) partitioned by (ts2 timestamp) stored as carbondata TBLPROPERTIES ('SORT_COLUMNS'='ts,metric','SORT_SCOPE'='GLOBAL_SORT')").show()

==========================

Definition of tags_10_days_test is like below:

====================

spark.sql("create table tags_10_days_test(id string,resource string) stored as carbondata TBLPROPERTIES('SORT_COLUMNS'='id,resource')").show()

======================

Kindly go through above points and help us the query performence further.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)