GitHub user sujith71955 opened a pull request:
https://github.com/apache/carbondata/pull/2366 [CARBONDATA-2532][Integration] Carbon to support spark 2.3 version ``` ## What changes were proposed in this pull request? In this PR inorder to hide the compatibility issues of columnar vector API's from the existing common classes, i introduced an interface of the proxy vector readers, this proxy vector readers will take care the compatibility issues with respect to spark different versions. Column vector and Columnar Batch interface compatibility issues has been addressed in this PR, The changes were related to below modifications done in spark interface. Highlights: a) This is a refactoring of ColumnVector hierarchy and related classes. b) make ColumnVector read-only c) introduce WritableColumnVector with write interface d) remove ReadOnlyColumnVector e) Fixed spark-carbon integration API compatibility issues - By Sandi f) Corrected the testcases based on spark 2.3.0 behaviour change g) Excluded following dependency from pom.xml files <groupId>net.jpountz</groupId><artifactId>lz4</artifactId> as spark 2.3.0 changed it to org.lz4, so removed from the test class path of spark2,spark-common-test,spark2-examples ## How was this patch tested? Manual testing, and existing test-case execution ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/sujith71955/incubator-carbondata spark-2.3_carbon_spark_2.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2366.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2366 ---- commit 9d68b4270c46b99d8d7985069ce633a60c04ba87 Author: sujith71955 <sujithchacko.2010@...> Date: 2018-05-24T05:51:50Z [CARBONDATA-2532][Integration] Carbon to support spark 2.3 version ## What changes were proposed in this pull request? Column vector and Columnar Batch interface compatibility issues has been addressed in this PR, The changes were related to below modifications done in spark interface a) This is a refactoring of ColumnVector hierarchy and related classes. b) make ColumnVector read-only c) introduce WritableColumnVector with write interface d) remove ReadOnlyColumnVector In this PR inorder to hide the compatibility issues of columnar vector API's from the existing common classes, i introduced an interface of the proxy vector readers, this proxy vector readers will take care the compatibility issues with respect to spark different versions. ## How was this patch tested? Manual testing, and existing test-case execution [CARBONDATA-2532][Integration] Carbon to support spark 2.3 version ## What changes were proposed in this pull request? Column vector and Columnar Batch interface compatibility issues has been addressed in this PR, The changes were related to below modifications done in spark interface a) This is a refactoring of ColumnVector hierarchy and related classes. b) make ColumnVector read-only c) introduce WritableColumnVector with write interface d) remove ReadOnlyColumnVector In this PR inorder to hide the compatibility issues of columnar vector API's from the existing common classes, i introduced an interface of the proxy vector readers, this proxy vector readers will take care the compatibility issues with respect to spark different versions. ## How was this patch tested? Manual testing, and existing test-case execution commit 8e62ac800c39ee308466f70d574ce1892dde9436 Author: sujith71955 <sujithchacko.2010@...> Date: 2018-06-08T07:27:39Z ``` ## What changes were proposed in this pull request? In this PR inorder to hide the compatibility issues of columnar vector API's from the existing common classes, i introduced an interface of the proxy vector readers, this proxy vector readers will take care the compatibility issues with respect to spark different versions. Column vector and Columnar Batch interface compatibility issues has been addressed in this PR, The changes were related to below modifications done in spark interface. Highlights: a) This is a refactoring of ColumnVector hierarchy and related classes. b) make ColumnVector read-only c) introduce WritableColumnVector with write interface d) remove ReadOnlyColumnVector e) Fixed spark-carbon integration API compatibility issues - By Sandi f) Corrected the testcases based on spark 2.3.0 behaviour change g) Excluded following dependency from pom.xml files <groupId>net.jpountz</groupId><artifactId>lz4</artifactId> as spark 2.3.0 changed it to org.lz4, so removed from the test class path of spark2,spark-common-test,spark2-examples ## How was this patch tested? Manual testing, and existing test-case execution ``` commit 233f1ce3e667f5ca9c7d874bf3d4dc216cb484ef Author: sandeep-katta <sandeep.katta2007@...> Date: 2018-05-25T10:17:48Z WIP changed to completed.Only 1 testcase related to projection is failing. Carbon support for 2.3.1 commit b571400710df5fb0229870a2b0e7d73bfc9995af Author: sandeep-katta <sandeep.katta2007@...> Date: 2018-06-08T11:18:22Z Carbon need to support spark 2.3.0 version ---- --- |
Github user sujith71955 commented on the issue:
https://github.com/apache/carbondata/pull/2366 Please review and let us know for any suggestions. While squashing please squash the commits independently of mine and sandeep as both of us has commits related to this upgrade activity. --- |
In reply to this post by qiuchenjian-2
Github user sujith71955 commented on the issue:
https://github.com/apache/carbondata/pull/2366 @gvramana --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2366 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6285/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2366 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5248/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2366 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5249/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2366 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6286/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2366 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5124/ --- |
In reply to this post by qiuchenjian-2
Github user sujith71955 commented on the issue:
https://github.com/apache/carbondata/pull/2366 @ravipesala @gvramana @chenliang613 . Please review and merge as the builds are already passed. --- |
In reply to this post by qiuchenjian-2
Github user sujith71955 commented on the issue:
https://github.com/apache/carbondata/pull/2366 @sandeep-katta Please handle the comments if its applicable to your commit. Thanks. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2366 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6330/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2366 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5168/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2366 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5282/ --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2366#discussion_r195968291 --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/streaming/CarbonStreamRecordReader.java --- @@ -115,7 +109,7 @@ // vectorized reader private StructType outputSchema; - private ColumnarBatch columnarBatch; + private CarbonSparkVectorReader vectorProxy; --- End diff -- CarbonSparkVectorReader interface not required. create a abstract class keeping common code. --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2366#discussion_r195968507 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala --- @@ -65,7 +66,7 @@ object CarbonReflectionUtils { className, tableIdentifier, tableAlias)._1.asInstanceOf[UnresolvedRelation] - } else if (SPARK_VERSION.startsWith("2.2")) { + } else if (SPARK_VERSION.startsWith("2.2") || SPARK_VERSION.startsWith("2.3")) { --- End diff -- Add a function SPARK_VERSION.above(2.2), so that we need not change code every place for every new version --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2366#discussion_r195968598 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala --- @@ -140,6 +142,13 @@ object CarbonReflectionUtils { relation, expectedOutputAttributes, catalogTable)._1.asInstanceOf[LogicalRelation] + } else if (SPARK_VERSION.startsWith("2.3")) { --- End diff -- Add a function SPARK_VERSION.above, so that SPARK_VERSION.above(2.3), so that we need not change code every place for every new version --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2366#discussion_r195969282 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala --- @@ -355,18 +362,19 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy { } private def getDataSourceScan(relation: LogicalRelation, - output: Seq[Attribute], - partitions: Seq[PartitionSpec], - scanBuilder: (Seq[Attribute], Seq[Expression], Seq[Filter], - ArrayBuffer[AttributeReference], Seq[PartitionSpec]) => RDD[InternalRow], - candidatePredicates: Seq[Expression], - pushedFilters: Seq[Filter], - metadata: Map[String, String], - needDecoder: ArrayBuffer[AttributeReference], - updateRequestedColumns: Seq[Attribute]): DataSourceScanExec = { + output: Seq[Attribute], --- End diff -- Keep previous indentation --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2366#discussion_r195970266 --- Diff: examples/spark2/pom.xml --- @@ -204,5 +204,35 @@ <maven.test.skip>true</maven.test.skip> </properties> </profile> + <profile> + <id>spark-2.3</id> + <properties> + <spark.version>2.3.0</spark.version> + <scala.binary.version>2.11</scala.binary.version> + <scala.version>2.11.8</scala.version> + </properties> + <dependencies> + <!-- in spark 2.3 spark catalyst added dependency on spark-core--> + <dependency> + <groupId>org.apache.spark</groupId> + <artifactId>spark-core_${scala.binary.version}</artifactId> + <version>${spark.version}</version> + </dependency> + <dependency> + <groupId>org.apache.carbondata</groupId> + <artifactId>carbondata-core</artifactId> + <version>${project.version}</version> + <exclusions> + <!-- need to Exclude net.jpountz jar from this project. --- End diff -- Raise another PR to fix jar dependency problem, lz4 --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2366#discussion_r195971259 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/bigdecimal/TestBigDecimal.scala --- @@ -149,8 +149,9 @@ class TestBigDecimal extends QueryTest with BeforeAndAfterAll { } test("test sum*10 aggregation on big decimal column with high precision") { - checkAnswer(sql("select sum(salary)*10 from carbonBigDecimal_2"), - sql("select sum(salary)*10 from hiveBigDecimal")) + val carbonSeq = sql("select sum(salary)*10 from carbonBigDecimal_2").collect --- End diff -- better to change testcase as "select cast(sum(salary)*10 as double) from carbonBigDecimal_2" --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2366#discussion_r195971582 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/sql/commands/StoredAsCarbondataSuite.scala --- @@ -87,7 +87,7 @@ class StoredAsCarbondataSuite extends QueryTest with BeforeAndAfterEach { sql("CREATE TABLE carbon_table(key INT, value STRING) STORED AS ") } catch { case e: Exception => - assert(e.getMessage.contains("no viable alternative at input")) + assert(true) --- End diff -- Add or condition with message --- |
Free forum by Nabble | Edit this page |