Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2974: [CARBONDATA-2563][CATALYST] Explain query wit...

Classic

List

32 messages Options

Options

12

[GitHub] carbondata pull request #2974: [CARBONDATA-2563][CATALYST] Explain query wit...

GitHub user sujith71955 opened a pull request:

https://github.com/apache/carbondata/pull/2974

[CARBONDATA-2563][CATALYST] Explain query with Order by operator is fired Spark Job which is increase explain query time even though isQueryStatisticsEnabled is false

## What changes were proposed in this pull request?
Even though isQueryStatisticsEnabled is false which means user doesnt wants to see statistics for explain command, still the engine tries to fetch the paritions information
which causes a job execution in case of order by query, this is mainly because spark engine does sampling for defifning certain range within paritions for sorting process.

As part of solution the explain command process shall fetch the parition info only if isQueryStatisticsEnabled true.

## How was this patch tested?
Manual testing.

Before fix
![image](https://user-images.githubusercontent.com/12999161/49452212-c0075b80-f806-11e8-903b-d47369c6f0d7.png)

After fix
![explain](https://user-images.githubusercontent.com/12999161/49452265-de6d5700-f806-11e8-8154-bc86a04db073.PNG)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sujith71955/incubator-carbondata master_explin

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2974.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2974

----
commit b20005d3cfbfbadd316ef9d8f7749c9236fe8bc8
Author: s71955 <sujithchacko.2010@...>
Date: 2018-12-04T15:18:55Z

[CARBONDATA-2563][CATALYST] Explain query with Order by operator is fired Spark Job which is increase explain query time even though isQueryStatisticsEnabled is false

## What changes were proposed in this pull request?
Even though isQueryStatisticsEnabled is false which means user doesnt wants to see statistics for explain command, still the engine tries to fetch the paritions information
which causes a job execution in case of order by query, this is mainly because spark engine does sampling for defifning certain range within paritions for sorting process.

As part of solution the explain command process shall fetch the parition info only if isQueryStatisticsEnabled true.

## How was this patch tested?
Manual testing.

----

---

[GitHub] carbondata issue #2974: [CARBONDATA-2563][CATALYST] Explain query with Order...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2974

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1635/

---

[GitHub] carbondata issue #2974: [CARBONDATA-2563][CATALYST] Explain query with Order...

In reply to this post by qiuchenjian-2

Github user sujith71955 commented on the issue:

https://github.com/apache/carbondata/pull/2974

@ravipesala @kumarvishal09 Please review and let me know for suggestions.

---

[GitHub] carbondata issue #2974: [CARBONDATA-2563][CATALYST] Explain query with Order...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2974

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1636/

---

[GitHub] carbondata issue #2974: [CARBONDATA-2563][CATALYST] Explain query with Order...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2974

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9896/

---

[GitHub] carbondata issue #2974: [CARBONDATA-2563][CATALYST] Explain query with Order...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2974

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1847/

---

[GitHub] carbondata issue #2974: [CARBONDATA-2563][CATALYST] Explain query with Order...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2974

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1849/

---

[GitHub] carbondata pull request #2974: [CARBONDATA-2563][CATALYST] Explain query wit...

In reply to this post by qiuchenjian-2

Github user qiuchenjian commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2974#discussion_r238906069

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonExplainCommand.scala ---
@@ -51,8 +51,8 @@ case class CarbonExplainCommand(
sparkSession.sessionState.executePlan(child.asInstanceOf[ExplainCommand].logicalPlan)
--- End diff --

Whether this code (line 50 and 51) is not executed , if isQueryStatisticsEnabled is false

---

[GitHub] carbondata issue #2974: [CARBONDATA-2563][CATALYST] Explain query with Order...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2974

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1638/

---

[GitHub] carbondata issue #2974: [CARBONDATA-2563][CATALYST] Explain query with Order...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2974

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9898/

---

[GitHub] carbondata issue #2974: [CARBONDATA-2563][CATALYST] Explain query with Order...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2974

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1639/

---

[GitHub] carbondata issue #2974: [CARBONDATA-2563][CATALYST] Explain query with Order...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2974

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1850/

---

[GitHub] carbondata issue #2974: [CARBONDATA-2563][CATALYST] Explain query with Order...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2974

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9899/

---

[GitHub] carbondata pull request #2974: [CARBONDATA-2563][CATALYST] Explain query wit...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2974#discussion_r239703828

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonExplainCommand.scala ---
@@ -51,8 +51,13 @@ case class CarbonExplainCommand(
sparkSession.sessionState.executePlan(child.asInstanceOf[ExplainCommand].logicalPlan)
try {
ExplainCollector.setup()
- queryExecution.toRdd.partitions
if (ExplainCollector.enabled()) {
+ queryExecution.toRdd.partitions
+ // For count(*) queries the explain collector will be disabled, so profiler
+ // informations not required in such scenarios.
+ if(null==ExplainCollector.getFormatedOutput) {
--- End diff --

formatting is not correct please format if condition code

---

[GitHub] carbondata pull request #2974: [CARBONDATA-2563][CATALYST] Explain query wit...

In reply to this post by qiuchenjian-2

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2974#discussion_r239998443

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonExplainCommand.scala ---
@@ -51,8 +51,8 @@ case class CarbonExplainCommand(
sparkSession.sessionState.executePlan(child.asInstanceOf[ExplainCommand].logicalPlan)
--- End diff --

Right, line 50 we can pull down within isQueryStatisticsEnabled condition,
but in line number 51 the system checks whether isQueryStatisticsEnabled is true then based on that it instantiates ExplainCollector and determines isQueryStatisticsEnabled or not. so i think 51 we cannot pull down. modified the code. thanks

---

[GitHub] carbondata pull request #2974: [CARBONDATA-2563][CATALYST] Explain query wit...

In reply to this post by qiuchenjian-2

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2974#discussion_r239998447

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonExplainCommand.scala ---
@@ -51,8 +51,13 @@ case class CarbonExplainCommand(
sparkSession.sessionState.executePlan(child.asInstanceOf[ExplainCommand].logicalPlan)
try {
ExplainCollector.setup()
- queryExecution.toRdd.partitions
if (ExplainCollector.enabled()) {
+ queryExecution.toRdd.partitions
+ // For count(*) queries the explain collector will be disabled, so profiler
+ // informations not required in such scenarios.
+ if(null==ExplainCollector.getFormatedOutput) {
--- End diff --

update. thanks

---

[GitHub] carbondata issue #2974: [CARBONDATA-2563][CATALYST] Explain query with Order...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2974

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1662/

---

[GitHub] carbondata issue #2974: [CARBONDATA-2563][CATALYST] Explain query with Order...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2974

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9922/

---

[GitHub] carbondata issue #2974: [CARBONDATA-2563][CATALYST] Explain query with Order...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2974

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1874/

---

[GitHub] carbondata pull request #2974: [CARBONDATA-2563][CATALYST] Explain query wit...

In reply to this post by qiuchenjian-2

Github user qiuchenjian commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2974#discussion_r240065228

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonExplainCommand.scala ---
@@ -51,8 +51,8 @@ case class CarbonExplainCommand(
sparkSession.sessionState.executePlan(child.asInstanceOf[ExplainCommand].logicalPlan)
--- End diff --

Now the lines have changed (at that time Line 50 an 51 are sparkSession.sessionState.executePlan)ï¼ and you have done that. Now it's OK

---

12