[GitHub] [carbondata] jackylk opened a new pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk opened a new pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
jackylk opened a new pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576
 
 
    ### Why is this PR needed?
   CarbonData integration with Spark 2.4 is a long expected feature from community
   
    ### What changes were proposed in this PR?
   1. Support integration with Spark 2.4
   2. Removing support of Spark 2.1 and 2.2
       
    ### Does this PR introduce any user interface change?
    - Yes. New API from Spark 2.4 can be used to access CarbonData
   
    ### Is any new testcase added?
    - No
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
CarbonDataQA1 commented on issue #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#issuecomment-573410959
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1611/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#issuecomment-573412253
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1612/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#issuecomment-573414073
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1613/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] zzcclp commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
zzcclp commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#discussion_r365590114
 
 

 ##########
 File path: build/README.md
 ##########
 @@ -25,11 +25,9 @@
 * [Apache Thrift 0.9.3](http://archive.apache.org/dist/thrift/0.9.3/)
 
 Review comment:
   currently it requires JDK 8, please modify too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] zzcclp commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
zzcclp commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#discussion_r365590190
 
 

 ##########
 File path: build/README.md
 ##########
 @@ -25,11 +25,9 @@
 * [Apache Thrift 0.9.3](http://archive.apache.org/dist/thrift/0.9.3/)
 
 ## Build command
-Build with different supported versions of Spark, by default using Spark 2.2.1 to build
+Build with different supported versions of Spark, by default using Spark 2.4.4
 
 Review comment:
   default version is spark 2.3.4. will it use spark 2.4 as default version?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#discussion_r366356643
 
 

 ##########
 File path: build/README.md
 ##########
 @@ -25,11 +25,9 @@
 * [Apache Thrift 0.9.3](http://archive.apache.org/dist/thrift/0.9.3/)
 
 Review comment:
   fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#discussion_r366356965
 
 

 ##########
 File path: build/README.md
 ##########
 @@ -25,11 +25,9 @@
 * [Apache Thrift 0.9.3](http://archive.apache.org/dist/thrift/0.9.3/)
 
 ## Build command
-Build with different supported versions of Spark, by default using Spark 2.2.1 to build
+Build with different supported versions of Spark, by default using Spark 2.4.4
 
 Review comment:
   we can change to spark 2.4 after this PR is merged

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#issuecomment-574215774
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1640/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#issuecomment-574566036
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1642/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] xuchuanyin commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
xuchuanyin commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#discussion_r368280039
 
 

 ##########
 File path: README.md
 ##########
 @@ -28,8 +28,8 @@ Visit count: [![HitCount](http://hits.dwyl.io/jackylk/apache/carbondata.svg)](ht
 
 
 ## Status
-Spark2.2:
-[![Build Status](https://builds.apache.org/buildStatus/icon?job=carbondata-master-spark-2.2)](https://builds.apache.org/view/A-D/view/CarbonData/job/carbondata-master-spark-2.2/lastBuild/testReport)
+Spark2.3:
+[![Build Status](https://builds.apache.org/buildStatus/icon?job=carbondata-master-spark-2.3)](https://builds.apache.org/view/A-D/view/CarbonData/job/carbondata-master-spark-2.2/lastBuild/testReport)
 
 Review comment:
   2.2 is still in url. Is it a mistake?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] xuchuanyin commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
xuchuanyin commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#discussion_r368281363
 
 

 ##########
 File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateDataSourceTableCommand.scala
 ##########
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.table
+
+import org.apache.spark.sql.{CarbonEnv, CarbonSource, Row, SparkSession}
+import org.apache.spark.sql.catalyst.catalog.CatalogTable
+import org.apache.spark.sql.execution.command.{CreateDataSourceTableCommand, MetadataCommand}
+
+/**
+ * Command to create table in case of 'USING CARBONDATA' DDL
+ *
+ * @param catalogTable catalog table created by spark
+ * @param ignoreIfExists ignore if table exists
+ * @param sparkSession spark session
+ */
+case class CarbonCreateDataSourceTableCommand(
+    catalogTable: CatalogTable,
+    ignoreIfExists: Boolean,
+    sparkSession: SparkSession)
+  extends MetadataCommand {
+
+  override def processMetadata(session: SparkSession): Seq[Row] = {
+    // Run the spark command to create table in metastore before saving carbon schema
+    // in table path.
+    // This is required for spark 2.4, because spark 2.4 will fail to create table
+    // if table path is created before hand
 
 Review comment:
   'before hand' --> 'beforehand'

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] xuchuanyin commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
xuchuanyin commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#discussion_r368281590
 
 

 ##########
 File path: pom.xml
 ##########
 @@ -575,12 +530,14 @@
                 <sourceDirectory>${basedir}/processing/src/main/java</sourceDirectory>
                 <sourceDirectory>${basedir}/hadoop/src/main/java</sourceDirectory>
                 <sourceDirectory>${basedir}/integration/spark2/src/main/scala</sourceDirectory>
-                <sourceDirectory>${basedir}/integration/spark2/src/main/spark2.2</sourceDirectory>
-                <sourceDirectory>${basedir}/integration/spark2/src/main/commonTo2.1And2.2</sourceDirectory>
-                <sourceDirectory>${basedir}/integration/spark2/src/main/commonTo2.2And2.3</sourceDirectory>
+                <sourceDirectory>${basedir}/integration/spark2/src/main/commonTo2.2AndAbove</sourceDirectory>
 
 Review comment:
   commonTo2.2AndAbove? still has 2.2 compatibility?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] xuchuanyin commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
xuchuanyin commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#discussion_r368280213
 
 

 ##########
 File path: build/README.md
 ##########
 @@ -25,11 +25,9 @@
 * [Apache Thrift 0.9.3](http://archive.apache.org/dist/thrift/0.9.3/)
 
 Review comment:
   since we are using jdk-8, maybe we can make use of java-8 feature in code later, such as stream and lambda

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] xuchuanyin commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
xuchuanyin commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#discussion_r368281422
 
 

 ##########
 File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDropTableCommand.scala
 ##########
 @@ -134,6 +134,7 @@ case class CarbonDropTableCommand(
       }
       val indexDatamapSchemas =
         DataMapStoreManager.getInstance().getDataMapSchemasOfTable(carbonTable)
+      LOGGER.info(s"Dropping DataMaps in table $tableName, size: " + indexDatamapSchemas.size())
 
 Review comment:
   why not
   
   LOGGER.info(s"Dropping DataMaps in table $tableName, size: ${indexDatamapSchemas.size()}")

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#discussion_r368515360
 
 

 ##########
 File path: README.md
 ##########
 @@ -28,8 +28,8 @@ Visit count: [![HitCount](http://hits.dwyl.io/jackylk/apache/carbondata.svg)](ht
 
 
 ## Status
-Spark2.2:
-[![Build Status](https://builds.apache.org/buildStatus/icon?job=carbondata-master-spark-2.2)](https://builds.apache.org/view/A-D/view/CarbonData/job/carbondata-master-spark-2.2/lastBuild/testReport)
+Spark2.3:
+[![Build Status](https://builds.apache.org/buildStatus/icon?job=carbondata-master-spark-2.3)](https://builds.apache.org/view/A-D/view/CarbonData/job/carbondata-master-spark-2.2/lastBuild/testReport)
 
 Review comment:
   fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#discussion_r368515412
 
 

 ##########
 File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateDataSourceTableCommand.scala
 ##########
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.table
+
+import org.apache.spark.sql.{CarbonEnv, CarbonSource, Row, SparkSession}
+import org.apache.spark.sql.catalyst.catalog.CatalogTable
+import org.apache.spark.sql.execution.command.{CreateDataSourceTableCommand, MetadataCommand}
+
+/**
+ * Command to create table in case of 'USING CARBONDATA' DDL
+ *
+ * @param catalogTable catalog table created by spark
+ * @param ignoreIfExists ignore if table exists
+ * @param sparkSession spark session
+ */
+case class CarbonCreateDataSourceTableCommand(
+    catalogTable: CatalogTable,
+    ignoreIfExists: Boolean,
+    sparkSession: SparkSession)
+  extends MetadataCommand {
+
+  override def processMetadata(session: SparkSession): Seq[Row] = {
+    // Run the spark command to create table in metastore before saving carbon schema
+    // in table path.
+    // This is required for spark 2.4, because spark 2.4 will fail to create table
+    // if table path is created before hand
 
 Review comment:
   fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#discussion_r368515979
 
 

 ##########
 File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDropTableCommand.scala
 ##########
 @@ -134,6 +134,7 @@ case class CarbonDropTableCommand(
       }
       val indexDatamapSchemas =
         DataMapStoreManager.getInstance().getDataMapSchemasOfTable(carbonTable)
+      LOGGER.info(s"Dropping DataMaps in table $tableName, size: " + indexDatamapSchemas.size())
 
 Review comment:
   fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#discussion_r368516506
 
 

 ##########
 File path: pom.xml
 ##########
 @@ -575,12 +530,14 @@
                 <sourceDirectory>${basedir}/processing/src/main/java</sourceDirectory>
                 <sourceDirectory>${basedir}/hadoop/src/main/java</sourceDirectory>
                 <sourceDirectory>${basedir}/integration/spark2/src/main/scala</sourceDirectory>
-                <sourceDirectory>${basedir}/integration/spark2/src/main/spark2.2</sourceDirectory>
-                <sourceDirectory>${basedir}/integration/spark2/src/main/commonTo2.1And2.2</sourceDirectory>
-                <sourceDirectory>${basedir}/integration/spark2/src/main/commonTo2.2And2.3</sourceDirectory>
+                <sourceDirectory>${basedir}/integration/spark2/src/main/commonTo2.2AndAbove</sourceDirectory>
 
 Review comment:
   fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3576: [CARBONDATA-3514] Support spark 2.4 integration

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3576: [CARBONDATA-3514] Support spark 2.4 integration
URL: https://github.com/apache/carbondata/pull/3576#issuecomment-576257262
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1706/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
12