[GitHub] carbondata pull request #2779: [WIP] Upgrade spark integration version to 2....

classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2779#discussion_r221414692
 
    --- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
    @@ -0,0 +1,55 @@
    +/*
    --- End diff --
   
    My comment : only for 4 parameters , copy the whole file(CarbonDataSourceScan.scala) for spark 2.3 integration,  may not require. see if can add the judgement for different spark version with different code/parameters.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/2779
 
    My comment : only for 4 parameters , copy the whole file(CarbonDataSourceScan.scala) for spark 2.3 integration, may not require. see if can add the judgement for different spark version with different code/parameters.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user zzcclp commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2779#discussion_r221419564
 
    --- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
    @@ -0,0 +1,55 @@
    +/*
    --- End diff --
   
    @chenliang613 , it's not about the different code/parameters, it changes common val parameters to lazy val parameters. I think it's difficult to add lazy keyword on val parameters according to spark version.
    Or does anyone know how to do and give me some help, thanks.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2779#discussion_r221420261
 
    --- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
    @@ -0,0 +1,55 @@
    +/*
    --- End diff --
   
    I think it is not possible to add lazy variable without breaking spark 2.2 integration. Since this class is small, I think it is ok to have a separate one for spark 2.3 integration.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2779
 
    @chenliang613 I think it is not possible to add lazy variable without breaking spark 2.2 integration. Since this class is small, I think it is ok to have a separate one for spark 2.3 integration


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2779
 
    I think it is not possible to add lazy variable without breaking spark 2.2 integration. Since this class is small, I think it is ok to have a separate one for spark 2.3 integration


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user zzcclp commented on the issue:

    https://github.com/apache/carbondata/pull/2779
 
    @sujith71955 any suggestion for this?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2779#discussion_r221444503
 
    --- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
    @@ -0,0 +1,55 @@
    +/*
    --- End diff --
   
    ok.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/2779
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2779
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sujith71955 commented on the issue:

    https://github.com/apache/carbondata/pull/2779
 
    LGTM  Even though SPARK-PR#21815 changes is a workaround solution and in future  its subjected to change. Need a caution here. Can you please add this PR reference in the modified code of the carbon  for future reference.
    Thanks for your effort.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sujith71955 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2779#discussion_r221445524
 
    --- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
    @@ -0,0 +1,55 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.strategy
    +
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier}
    +import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder}
    +import org.apache.spark.sql.catalyst.plans.physical.Partitioning
    +import org.apache.spark.sql.execution.FileSourceScanExec
    +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation}
    +
    +/**
    + *  Physical plan node for scanning data. It is applied for both tables
    + *  USING carbondata and STORED AS CARBONDATA.
    + */
    +class CarbonDataSourceScan(
    +    override val output: Seq[Attribute],
    +    val rdd: RDD[InternalRow],
    +    @transient override val relation: HadoopFsRelation,
    +    val partitioning: Partitioning,
    +    val md: Map[String, String],
    +    identifier: Option[TableIdentifier],
    +    @transient private val logicalRelation: LogicalRelation)
    +  extends FileSourceScanExec(
    +    relation,
    +    output,
    +    relation.dataSchema,
    +    Seq.empty,
    +    Seq.empty,
    +    identifier) {
    +
    +  override lazy val supportsBatch: Boolean = true
    +
    +  override lazy val (outputPartitioning, outputOrdering): (Partitioning, Seq[SortOrder]) =
    +    (partitioning, Nil)
    +
    +  override lazy val metadata: Map[String, String] = md
    --- End diff --
   
    nit:  made lazy since spark 2.3.2 version (SPARK-PR#21815)


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user zzcclp commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2779#discussion_r221446372
 
    --- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala ---
    @@ -0,0 +1,55 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.strategy
    +
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier}
    +import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder}
    +import org.apache.spark.sql.catalyst.plans.physical.Partitioning
    +import org.apache.spark.sql.execution.FileSourceScanExec
    +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation}
    +
    +/**
    + *  Physical plan node for scanning data. It is applied for both tables
    + *  USING carbondata and STORED AS CARBONDATA.
    + */
    +class CarbonDataSourceScan(
    +    override val output: Seq[Attribute],
    +    val rdd: RDD[InternalRow],
    +    @transient override val relation: HadoopFsRelation,
    +    val partitioning: Partitioning,
    +    val md: Map[String, String],
    +    identifier: Option[TableIdentifier],
    +    @transient private val logicalRelation: LogicalRelation)
    +  extends FileSourceScanExec(
    +    relation,
    +    output,
    +    relation.dataSchema,
    +    Seq.empty,
    +    Seq.empty,
    +    identifier) {
    +
    +  override lazy val supportsBatch: Boolean = true
    +
    +  override lazy val (outputPartitioning, outputOrdering): (Partitioning, Seq[SortOrder]) =
    +    (partitioning, Nil)
    +
    +  override lazy val metadata: Map[String, String] = md
    --- End diff --
   
    fixed


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user zzcclp commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2779#discussion_r221446374
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala ---
    @@ -296,7 +296,7 @@ object CarbonReflectionUtils {
               classOf[LogicalPlan],
               classOf[Seq[Attribute]],
               classOf[SparkPlan])
    -      method.invoke(dataSourceObj, mode, query, query.output, physicalPlan)
    +      method.invoke(dataSourceObj, mode, query, query.output.map(_.name), physicalPlan)
    --- End diff --
   
    added some comments too


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2779
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/656/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sujith71955 commented on the issue:

    https://github.com/apache/carbondata/pull/2779
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2779
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8920/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2779
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/852/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sujith71955 commented on the issue:

    https://github.com/apache/carbondata/pull/2779
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user zzcclp commented on the issue:

    https://github.com/apache/carbondata/pull/2779
 
    the failed test case is not related to this pr, right?


---
123