Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2779: [WIP] Upgrade spark integration version to 2....

Classic

List

49 messages Options

Options

123

[GitHub] carbondata pull request #2779: [WIP] Upgrade spark integration version to 2....

GitHub user zzcclp opened a pull request:

https://github.com/apache/carbondata/pull/2779

[WIP] Upgrade spark integration version to 2.3.2

Upgrade spark integration version to 2.3.2

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zzcclp/carbondata wip_upgrade_to_spark2.3.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2779.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2779

----
commit 586cf7b6a23fa5b110f3490f8123d1b15b30e4bc
Author: Zhang Zhichao <441586683@...>
Date: 2018-09-27T17:30:34Z

[WIP] Upgrade spark integration version to 2.3.2

Upgrade spark integration version to 2.3.2

----

---

[GitHub] carbondata pull request #2779: [WIP] Upgrade spark integration version to 2....

Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2779#discussion_r221013145

--- Diff: integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala ---
@@ -296,7 +296,7 @@ object CarbonReflectionUtils {
classOf[LogicalPlan],
classOf[Seq[Attribute]],
classOf[SparkPlan])
- method.invoke(dataSourceObj, mode, query, query.output, physicalPlan)
+ method.invoke(dataSourceObj, mode, query, query.output.map(_.name), physicalPlan)
--- End diff --

The parameters of 'writeAndRead' method had been changed, please see: [SPARK-PR#22346](https://github.com/apache/spark/pull/22346)

---

[GitHub] carbondata pull request #2779: [WIP] Upgrade spark integration version to 2....

In reply to this post by qiuchenjian-2

Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2779#discussion_r221014706

--- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.strategy
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder}
+import org.apache.spark.sql.catalyst.plans.physical.Partitioning
+import org.apache.spark.sql.execution.FileSourceScanExec
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation}
+
+/**
+ * Physical plan node for scanning data. It is applied for both tables
+ * USING carbondata and STORED AS CARBONDATA.
+ */
+class CarbonDataSourceScan(
+ override val output: Seq[Attribute],
+ val rdd: RDD[InternalRow],
+ @transient override val relation: HadoopFsRelation,
+ val partitioning: Partitioning,
+ val md: Map[String, String],
+ identifier: Option[TableIdentifier],
+ @transient private val logicalRelation: LogicalRelation)
+ extends FileSourceScanExec(
+ relation,
+ output,
+ relation.dataSchema,
+ Seq.empty,
+ Seq.empty,
+ identifier) {
+
+ override lazy val supportsBatch: Boolean = true
+
+ override lazy val (outputPartitioning, outputOrdering): (Partitioning, Seq[SortOrder]) =
+ (partitioning, Nil)
+
+ override lazy val metadata: Map[String, String] = md
--- End diff --

The parameters (supportsBatch, outputPartitioning, outputOrdering, metadata) had been added keyword 'lazy', please see: [SPARK-PR#21815](https://github.com/apache/spark/pull/21815)

---

[GitHub] carbondata issue #2779: [WIP] Upgrade spark integration version to 2.3.2

In reply to this post by qiuchenjian-2

Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/2779

@jackylk @chenliang613 @sujith71955 please review.

---

[GitHub] carbondata issue #2779: [WIP] Upgrade spark integration version to 2.3.2

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2779

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/615/

---

[GitHub] carbondata issue #2779: [WIP] Upgrade spark integration version to 2.3.2

In reply to this post by qiuchenjian-2

Github user sujith71955 commented on the issue:

https://github.com/apache/carbondata/pull/2779

Thanks for raising the PR, It will better if you can add the description about the changes in this PR.

---

[GitHub] carbondata issue #2779: [WIP] Upgrade spark integration version to 2.3.2

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2779

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8876/

---

[GitHub] carbondata issue #2779: [WIP] Upgrade spark integration version to 2.3.2

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2779

retest this please

---

[GitHub] carbondata pull request #2779: [WIP] Upgrade spark integration version to 2....

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2779#discussion_r221131678

--- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
@@ -0,0 +1,55 @@
+/*
--- End diff --

Why need to move CarbonDataSourceScan.scala?

---

[GitHub] carbondata issue #2779: [WIP] Upgrade spark integration version to 2.3.2

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2779

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/619/

---

[GitHub] carbondata issue #2779: [WIP] Upgrade spark integration version to 2.3.2

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2779

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8880/

---

[GitHub] carbondata issue #2779: [WIP] Upgrade spark integration version to 2.3.2

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2779

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/811/

---

[GitHub] carbondata pull request #2779: [WIP] Upgrade spark integration version to 2....

In reply to this post by qiuchenjian-2

Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2779#discussion_r221141998

--- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
@@ -0,0 +1,55 @@
+/*
--- End diff --

move original class 'CarbonDataSourceScan' to src path 'commonTo2.1And2.2', and add a new class 'CarbonDataSourceScan' in src path 'spark2.3' which is added some lazy parameters.

---

[GitHub] carbondata issue #2779: [WIP] Upgrade spark integration version to 2.3.2

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2779

@zzcclp Please check and fix the tests

---

[GitHub] carbondata issue #2779: [WIP] Upgrade spark integration version to 2.3.2

In reply to this post by qiuchenjian-2

Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/2779

@ravipesala can you help me to check why these three test cases fail? It's about the decimal precision.

---

[GitHub] carbondata issue #2779: [WIP] Upgrade spark integration version to 2.3.2

In reply to this post by qiuchenjian-2

Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/2779

@ravipesala I know how to fix and will fix the tests ASAP.

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2779

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/837/

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2779

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/643/

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2779

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8905/

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/2779

@sujith71955 @chenliang613 @ravipesala @jackylk this pr is ready, please review, thanks.

---

123