Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2779: [WIP] Upgrade spark integration version to 2....

Classic

List

49 messages Options

Options

123

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2779#discussion_r221414692

--- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
@@ -0,0 +1,55 @@
+/*
--- End diff --

My comment : only for 4 parameters , copy the whole file(CarbonDataSourceScan.scala) for spark 2.3 integration, may not require. see if can add the judgement for different spark version with different code/parameters.

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2779

My comment : only for 4 parameters , copy the whole file(CarbonDataSourceScan.scala) for spark 2.3 integration, may not require. see if can add the judgement for different spark version with different code/parameters.

---

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

In reply to this post by qiuchenjian-2

Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2779#discussion_r221419564

--- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
@@ -0,0 +1,55 @@
+/*
--- End diff --

@chenliang613 , it's not about the different code/parameters, it changes common val parameters to lazy val parameters. I think it's difficult to add lazy keyword on val parameters according to spark version.
Or does anyone know how to do and give me some help, thanks.

---

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2779#discussion_r221420261

--- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
@@ -0,0 +1,55 @@
+/*
--- End diff --

I think it is not possible to add lazy variable without breaking spark 2.2 integration. Since this class is small, I think it is ok to have a separate one for spark 2.3 integration.

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2779

@chenliang613 I think it is not possible to add lazy variable without breaking spark 2.2 integration. Since this class is small, I think it is ok to have a separate one for spark 2.3 integration

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2779

I think it is not possible to add lazy variable without breaking spark 2.2 integration. Since this class is small, I think it is ok to have a separate one for spark 2.3 integration

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/2779

@sujith71955 any suggestion for this?

---

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2779#discussion_r221444503

--- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
@@ -0,0 +1,55 @@
+/*
--- End diff --

ok.

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2779

LGTM

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2779

LGTM

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user sujith71955 commented on the issue:

https://github.com/apache/carbondata/pull/2779

LGTM Even though SPARK-PR#21815 changes is a workaround solution and in future its subjected to change. Need a caution here. Can you please add this PR reference in the modified code of the carbon for future reference.
Thanks for your effort.

---

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

In reply to this post by qiuchenjian-2

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2779#discussion_r221445524

--- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.strategy
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder}
+import org.apache.spark.sql.catalyst.plans.physical.Partitioning
+import org.apache.spark.sql.execution.FileSourceScanExec
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation}
+
+/**
+ * Physical plan node for scanning data. It is applied for both tables
+ * USING carbondata and STORED AS CARBONDATA.
+ */
+class CarbonDataSourceScan(
+ override val output: Seq[Attribute],
+ val rdd: RDD[InternalRow],
+ @transient override val relation: HadoopFsRelation,
+ val partitioning: Partitioning,
+ val md: Map[String, String],
+ identifier: Option[TableIdentifier],
+ @transient private val logicalRelation: LogicalRelation)
+ extends FileSourceScanExec(
+ relation,
+ output,
+ relation.dataSchema,
+ Seq.empty,
+ Seq.empty,
+ identifier) {
+
+ override lazy val supportsBatch: Boolean = true
+
+ override lazy val (outputPartitioning, outputOrdering): (Partitioning, Seq[SortOrder]) =
+ (partitioning, Nil)
+
+ override lazy val metadata: Map[String, String] = md
--- End diff --

nit: made lazy since spark 2.3.2 version (SPARK-PR#21815)

---

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

In reply to this post by qiuchenjian-2

Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2779#discussion_r221446372

--- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.strategy
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder}
+import org.apache.spark.sql.catalyst.plans.physical.Partitioning
+import org.apache.spark.sql.execution.FileSourceScanExec
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation}
+
+/**
+ * Physical plan node for scanning data. It is applied for both tables
+ * USING carbondata and STORED AS CARBONDATA.
+ */
+class CarbonDataSourceScan(
+ override val output: Seq[Attribute],
+ val rdd: RDD[InternalRow],
+ @transient override val relation: HadoopFsRelation,
+ val partitioning: Partitioning,
+ val md: Map[String, String],
+ identifier: Option[TableIdentifier],
+ @transient private val logicalRelation: LogicalRelation)
+ extends FileSourceScanExec(
+ relation,
+ output,
+ relation.dataSchema,
+ Seq.empty,
+ Seq.empty,
+ identifier) {
+
+ override lazy val supportsBatch: Boolean = true
+
+ override lazy val (outputPartitioning, outputOrdering): (Partitioning, Seq[SortOrder]) =
+ (partitioning, Nil)
+
+ override lazy val metadata: Map[String, String] = md
--- End diff --

fixed

---

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

In reply to this post by qiuchenjian-2

Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2779#discussion_r221446374

--- Diff: integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala ---
@@ -296,7 +296,7 @@ object CarbonReflectionUtils {
classOf[LogicalPlan],
classOf[Seq[Attribute]],
classOf[SparkPlan])
- method.invoke(dataSourceObj, mode, query, query.output, physicalPlan)
+ method.invoke(dataSourceObj, mode, query, query.output.map(_.name), physicalPlan)
--- End diff --

added some comments too

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2779

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/656/

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user sujith71955 commented on the issue:

https://github.com/apache/carbondata/pull/2779

LGTM

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2779

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8920/

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2779

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/852/

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user sujith71955 commented on the issue:

https://github.com/apache/carbondata/pull/2779

LGTM

---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

In reply to this post by qiuchenjian-2

Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/2779

the failed test case is not related to this pr, right?

---

123