GitHub user sujith71955 opened a pull request:
https://github.com/apache/carbondata/pull/2642 [CARBONDATA-2532][Integration] Carbon to support spark 2.3.1 version In this PR inorder to hide the compatibility issues of columnar vector API's from the existing common classes, i introduced an interface of the proxy vector readers, this proxy vector readers will take care the compatibility issues with respect to spark different versions. Column vector and Columnar Batch interface compatibility issues has been addressed in this PR, The changes were related to below modifications done in spark interface. Highlights: a) This is a refactoring of ColumnVector hierarchy and related classes. By Sujith b) make ColumnVector read-only. By Sujith c) introduce WritableColumnVector with write interface. By Sujith d) remove ReadOnlyColumnVector. By Sujith e) Fixed spark-carbon integration API compatibility issues - By sandeep katta f) Corrected the testcases based on spark 2.3.0 behaviour change - By sandeep katta g) Excluded following dependency from pom.xml files net.jpountzlz4 as spark 2.3.0 changed it to org.lz4, so removed from the test class path of spark2,spark-common-test,spark2-examples You can merge this pull request into a Git repository by running: $ git pull https://github.com/sujith71955/incubator-carbondata mas_mig_spark2.3_carbon_latest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2642.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2642 ---- commit 7359151b612d3403e53c4759c853e1ab681fae7f Author: sujith71955 <sujithchacko.2010@...> Date: 2018-05-24T05:51:50Z [CARBONDATA-2532][Integration] Carbon to support spark 2.3 version, ColumnVector Interface Column vector and Columnar Batch interface compatibility issues has been addressed in this PR, The changes were related to below modifications done in spark interface a) This is a refactoring of ColumnVector hierarchy and related classes. b) make ColumnVector read-only c) introduce WritableColumnVector with write interface d) remove ReadOnlyColumnVector In this PR inorder to hide the compatibility issues of columnar vector API's from the existing common classes, i introduced an interface of the proxy vector readers, this proxy vector readers will take care the compatibility issues with respect to spark different versions. commit 5934d975b53276b2490c6c178ae5b71f539dac60 Author: sandeep-katta <sandeep.katta2007@...> Date: 2018-07-06T04:31:29Z [CARBONDATA-2532][Integration] Carbon to support spark 2.3 version, compatability issues All compatability issues when supporting 2.3 addressed Supported pom profile -P"spark-2.3" ---- --- |
Github user sujith71955 commented on the issue:
https://github.com/apache/carbondata/pull/2642 @sandeep-katta @gvramana --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2642 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6284/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2642 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7933/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2642 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6285/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2642 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6656/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2642 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6297/ --- |
In reply to this post by qiuchenjian-2
Github user sandeep-katta commented on the issue:
https://github.com/apache/carbondata/pull/2642 4 test cases are failing in SDV build which is not related this PR code changes. Same 4 test cases are failing other PR also refer !https://github.com/apache/carbondata/pull/2643 --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2642 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7946/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2642 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6669/ --- |
In reply to this post by qiuchenjian-2
Github user kevinjmh commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2642#discussion_r211463185 --- Diff: integration/spark2/src/main/java/org/apache/carbondata/spark/vectorreader/ColumnarVectorWrapper.java --- @@ -25,198 +25,204 @@ import org.apache.carbondata.spark.util.CarbonScalaUtil; import org.apache.parquet.column.Encoding; -import org.apache.spark.sql.execution.vectorized.ColumnVector; +import org.apache.spark.sql.CarbonVectorProxy; import org.apache.spark.sql.types.Decimal; class ColumnarVectorWrapper implements CarbonColumnVector { - private ColumnVector columnVector; + private CarbonVectorProxy writableColumnVector; --- End diff -- it is better to name this member a general name instead of a class name in spark2.3 --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2642#discussion_r211841539 --- Diff: integration/spark2/src/main/java/org/apache/carbondata/spark/vectorreader/ColumnarVectorWrapper.java --- @@ -25,198 +25,204 @@ import org.apache.carbondata.spark.util.CarbonScalaUtil; import org.apache.parquet.column.Encoding; -import org.apache.spark.sql.execution.vectorized.ColumnVector; +import org.apache.spark.sql.CarbonVectorProxy; import org.apache.spark.sql.types.Decimal; class ColumnarVectorWrapper implements CarbonColumnVector { - private ColumnVector columnVector; + private CarbonVectorProxy writableColumnVector; --- End diff -- agree --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2642#discussion_r211841980 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableUsingSparkCarbonFileFormat.scala --- @@ -111,10 +112,10 @@ class TestCreateTableUsingSparkCarbonFileFormat extends QueryTest with BeforeAnd sql("DROP TABLE IF EXISTS sdkOutputTable") //data source file format - if (sqlContext.sparkContext.version.startsWith("2.1")) { + if (SparkUtil.isSparkVersionEqualToX("2.1")) { --- End diff -- rename to `isSparkVersionEqualTo` --- |
In reply to this post by qiuchenjian-2
Github user aaron-aa commented on the issue:
https://github.com/apache/carbondata/pull/2642 Hi @sujith71955, it's great to see you do the integration work for latest spark release, so what's the time schedule to merge this pull into master? because my companies' production spark version is 2.3.1, we're expecting your progress. Thanks very much! --- |
In reply to this post by qiuchenjian-2
Github user zzcclp commented on the issue:
https://github.com/apache/carbondata/pull/2642 hi @aaron-aa , I think it's better to use Spark 2.3.2, Spark 2.3.2 has fixed some big issues which were found in Spark 2.3.1 and will be released soon. what do you think? --- |
In reply to this post by qiuchenjian-2
Github user sujith71955 commented on the issue:
https://github.com/apache/carbondata/pull/2642 @aaron-aa @zzcclp Re-base work is pending which i will finish might be in a day or couple. Will check with committers regarding the merge plan of this feature. As @zzcclp told it will be better to use spark 2.3.2 version because of some major defect fixes, but currently the release date for spark 2.3.2 is unclear. anyways once this feature will be merged it will take very less effort for rebasing with spark 2.3.2 version. --- |
In reply to this post by qiuchenjian-2
Github user aaron-aa commented on the issue:
https://github.com/apache/carbondata/pull/2642 @sujith71955 @zzcclp Thanks a lot for you guys information, which could help me reschedule the plan in advance! Hope spark 2.3.2 come out soon, and I will try to work on spark 2.2.1 currently. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/2642 Now spark 2.3.2 is about to release, can this PR works with all spark 2.3 branch including 2.3.2? --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2642 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6455/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2642 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6456/ --- |
Free forum by Nabble | Edit this page |