Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1730: [CARBONDATA-1937][PARTITION] Fix partition fe...

Classic

List

12 messages Options

Options

[GitHub] carbondata pull request #1730: [CARBONDATA-1937][PARTITION] Fix partition fe...

GitHub user ravipesala opened a pull request:

https://github.com/apache/carbondata/pull/1730

[CARBONDATA-1937][PARTITION] Fix partition fetch fail if null partition value present in integral columns

It seems like an issue in hive while querying partitions from metastore if any integral partition column contains a null value.

Now alternatively we get the full list of partitions from hive and then apply a filter to it.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [X] Any interfaces changed?

- [X] Any backward compatibility impacted?

- [X] Document update required?

- [X] Testing done
Tests added

- [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata partition-fail-string-allow

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1730.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1730

----
commit 687250dfe01d0955dff649806e0d0151d942cbd7
Author: ravipesala <ravi.pesala@...>
Date: 2017-12-27T17:02:46Z

Fix partition fetch on null partition of integral columns

----

---

[GitHub] carbondata issue #1730: [CARBONDATA-1937][PARTITION] Fix partition fetch fai...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1730

Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1150/

---

[GitHub] carbondata issue #1730: [CARBONDATA-1937][PARTITION] Fix partition fetch fai...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1730

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2367/

---

[GitHub] carbondata issue #1730: [CARBONDATA-1937][PARTITION] Fix partition fetch fai...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1730

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2581/

---

[GitHub] carbondata pull request #1730: [CARBONDATA-1937][PARTITION] Fix partition fe...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1730#discussion_r158956384

--- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
@@ -1501,6 +1501,15 @@

public static final String TIMESERIES_HIERARCHY = "timeseries.hierarchy";

+ /**
+ * It allows queries on hive metastore directly along with filter information, otherwise first
+ * fetches all partitions from hive and apply filters on it.
--- End diff --

Can you mention how to decide whether to set it to false or true?

---

[GitHub] carbondata pull request #1730: [CARBONDATA-1937][PARTITION] Fix partition fe...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1730#discussion_r158956514

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala ---
@@ -405,13 +407,60 @@ object CarbonFilters {
}
}

+ /**
+ * Fetches partition information from hive
+ * @param partitionFilters
+ * @param sparkSession
+ * @param identifier
+ * @return
+ */
def getPartitions(partitionFilters: Seq[Expression],
sparkSession: SparkSession,
identifier: TableIdentifier): Seq[String] = {
- val partitions =
- sparkSession.sessionState.catalog.listPartitionsByFilter(identifier, partitionFilters)
+ val partitions = {
+ try {
+ if (CarbonProperties.getInstance().
+ getProperty(CarbonCommonConstants.CARBON_READ_PARTITION_HIVE_DIRECT,
+ CarbonCommonConstants.CARBON_READ_PARTITION_HIVE_DIRECT_DEFAULT).toBoolean) {
+ sparkSession.sessionState.catalog.listPartitionsByFilter(identifier, partitionFilters)
+ } else {
+ getPartitionsAlternate(partitionFilters, sparkSession, identifier)
+ }
+ } catch {
+ case e: Exception =>
+ // Get partition information alternatively.
+ getPartitionsAlternate(partitionFilters, sparkSession, identifier)
+ }
+ }
partitions.toList.flatMap { partition =>
partition.spec.seq.map{case (column, value) => column + "=" + value}
}.toSet.toSeq
}
+
+ /**
+ * This is alternate way of getting partition information. It first fetches all partitions from
+ * hive and then apply filter instead of querying hive along with filters.
+ * @param partitionFilters
+ * @param sparkSession
+ * @param identifier
+ * @return
+ */
+ private def getPartitionsAlternate(partitionFilters: Seq[Expression],
--- End diff --

move `partitionFilters` to next line, please follow this in future

---

[GitHub] carbondata pull request #1730: [CARBONDATA-1937][PARTITION] Fix partition fe...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1730#discussion_r158956569

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala ---
@@ -405,13 +407,60 @@ object CarbonFilters {
}
}

+ /**
+ * Fetches partition information from hive
+ * @param partitionFilters
+ * @param sparkSession
+ * @param identifier
+ * @return
+ */
def getPartitions(partitionFilters: Seq[Expression],
sparkSession: SparkSession,
identifier: TableIdentifier): Seq[String] = {
- val partitions =
- sparkSession.sessionState.catalog.listPartitionsByFilter(identifier, partitionFilters)
+ val partitions = {
+ try {
+ if (CarbonProperties.getInstance().
+ getProperty(CarbonCommonConstants.CARBON_READ_PARTITION_HIVE_DIRECT,
+ CarbonCommonConstants.CARBON_READ_PARTITION_HIVE_DIRECT_DEFAULT).toBoolean) {
+ sparkSession.sessionState.catalog.listPartitionsByFilter(identifier, partitionFilters)
+ } else {
+ getPartitionsAlternate(partitionFilters, sparkSession, identifier)
+ }
+ } catch {
+ case e: Exception =>
+ // Get partition information alternatively.
+ getPartitionsAlternate(partitionFilters, sparkSession, identifier)
+ }
+ }
partitions.toList.flatMap { partition =>
partition.spec.seq.map{case (column, value) => column + "=" + value}
}.toSet.toSeq
}
+
+ /**
+ * This is alternate way of getting partition information. It first fetches all partitions from
+ * hive and then apply filter instead of querying hive along with filters.
+ * @param partitionFilters
--- End diff --

give comment for parameter

---

[GitHub] carbondata issue #1730: [CARBONDATA-1937][PARTITION] Fix partition fetch fai...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1730

Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1238/

---

[GitHub] carbondata issue #1730: [CARBONDATA-1937][PARTITION] Fix partition fetch fai...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1730

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2462/

---

[GitHub] carbondata issue #1730: [CARBONDATA-1937][PARTITION] Fix partition fetch fai...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1730

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2641/

---

[GitHub] carbondata issue #1730: [CARBONDATA-1937][PARTITION] Fix partition fetch fai...

In reply to this post by qiuchenjian-2

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1730

LGTM

---

[GitHub] carbondata pull request #1730: [CARBONDATA-1937][PARTITION] Fix partition fe...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1730

---