[
https://issues.apache.org/jira/browse/CARBONDATA-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacky Li resolved CARBONDATA-1335.
----------------------------------
Resolution: Fixed
Fix Version/s: 1.2.0
> Duplicated & time-consuming method call found in query
> ------------------------------------------------------
>
> Key: CARBONDATA-1335
> URL:
https://issues.apache.org/jira/browse/CARBONDATA-1335> Project: CarbonData
> Issue Type: Improvement
> Components: data-query
> Affects Versions: 1.1.1
> Reporter: xuchuanyin
> Priority: Minor
> Labels: performance
> Fix For: 1.2.0
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> # Scenario
> Currently we did a concurrent 14 queries on Carbondata. The queries are the same, but on different tables. We have noticed the following scene:
> + A single query took about 5s;
> + In concurrent scenario, each query took about 15s;
> By adding checkpoint in the log, we found that there was great latency in starting query jobs in spark.
> # Analyze
> When we fire a query, Carbondata firstly do some job in the client side, including parse/analyze plans and prepare filtered blocks and inputSplits. Then Carbondata start to submit query job to spark.
> We found in the first step, Carbondata took about 7s in current scenario, but it only took about <1s in single scenario.
> By studying the related code, we found the most time consuming method call was `CarbonSessionCatalog.lookupRelation`. In side this method, it called `super.lookupRelation` twice, which consumed about 3s each time.
> # Solution
> Carbondata only needs to call the `super.lookupRelation` only once, we need to remove the useless duplicated method call.
> I've tested in my environment and it works well. In concurrent scenario, each query takes about 12s (3s saved for the improvement).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)