GitHub user ravipesala opened a pull request:
https://github.com/apache/carbondata/pull/1730 [CARBONDATA-1937][PARTITION] Fix partition fetch fail if null partition value present in integral columns It seems like an issue in hive while querying partitions from metastore if any integral partition column contains a null value. Now alternatively we get the full list of partitions from hive and then apply a filter to it. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [X] Any interfaces changed? - [X] Any backward compatibility impacted? - [X] Document update required? - [X] Testing done Tests added - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata partition-fail-string-allow Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1730.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1730 ---- commit 687250dfe01d0955dff649806e0d0151d942cbd7 Author: ravipesala <ravi.pesala@...> Date: 2017-12-27T17:02:46Z Fix partition fetch on null partition of integral columns ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1730 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1150/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1730 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2367/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1730 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2581/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1730#discussion_r158956384 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1501,6 +1501,15 @@ public static final String TIMESERIES_HIERARCHY = "timeseries.hierarchy"; + /** + * It allows queries on hive metastore directly along with filter information, otherwise first + * fetches all partitions from hive and apply filters on it. --- End diff -- Can you mention how to decide whether to set it to false or true? --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1730#discussion_r158956514 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala --- @@ -405,13 +407,60 @@ object CarbonFilters { } } + /** + * Fetches partition information from hive + * @param partitionFilters + * @param sparkSession + * @param identifier + * @return + */ def getPartitions(partitionFilters: Seq[Expression], sparkSession: SparkSession, identifier: TableIdentifier): Seq[String] = { - val partitions = - sparkSession.sessionState.catalog.listPartitionsByFilter(identifier, partitionFilters) + val partitions = { + try { + if (CarbonProperties.getInstance(). + getProperty(CarbonCommonConstants.CARBON_READ_PARTITION_HIVE_DIRECT, + CarbonCommonConstants.CARBON_READ_PARTITION_HIVE_DIRECT_DEFAULT).toBoolean) { + sparkSession.sessionState.catalog.listPartitionsByFilter(identifier, partitionFilters) + } else { + getPartitionsAlternate(partitionFilters, sparkSession, identifier) + } + } catch { + case e: Exception => + // Get partition information alternatively. + getPartitionsAlternate(partitionFilters, sparkSession, identifier) + } + } partitions.toList.flatMap { partition => partition.spec.seq.map{case (column, value) => column + "=" + value} }.toSet.toSeq } + + /** + * This is alternate way of getting partition information. It first fetches all partitions from + * hive and then apply filter instead of querying hive along with filters. + * @param partitionFilters + * @param sparkSession + * @param identifier + * @return + */ + private def getPartitionsAlternate(partitionFilters: Seq[Expression], --- End diff -- move `partitionFilters` to next line, please follow this in future --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1730#discussion_r158956569 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala --- @@ -405,13 +407,60 @@ object CarbonFilters { } } + /** + * Fetches partition information from hive + * @param partitionFilters + * @param sparkSession + * @param identifier + * @return + */ def getPartitions(partitionFilters: Seq[Expression], sparkSession: SparkSession, identifier: TableIdentifier): Seq[String] = { - val partitions = - sparkSession.sessionState.catalog.listPartitionsByFilter(identifier, partitionFilters) + val partitions = { + try { + if (CarbonProperties.getInstance(). + getProperty(CarbonCommonConstants.CARBON_READ_PARTITION_HIVE_DIRECT, + CarbonCommonConstants.CARBON_READ_PARTITION_HIVE_DIRECT_DEFAULT).toBoolean) { + sparkSession.sessionState.catalog.listPartitionsByFilter(identifier, partitionFilters) + } else { + getPartitionsAlternate(partitionFilters, sparkSession, identifier) + } + } catch { + case e: Exception => + // Get partition information alternatively. + getPartitionsAlternate(partitionFilters, sparkSession, identifier) + } + } partitions.toList.flatMap { partition => partition.spec.seq.map{case (column, value) => column + "=" + value} }.toSet.toSeq } + + /** + * This is alternate way of getting partition information. It first fetches all partitions from + * hive and then apply filter instead of querying hive along with filters. + * @param partitionFilters --- End diff -- give comment for parameter --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1730 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1238/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1730 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2462/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1730 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2641/ --- |
In reply to this post by qiuchenjian-2
|
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |