Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Created] (CARBONDATA-3549) How to build carbondata-1.6.0 with spark-2.1.1

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Created] (CARBONDATA-3549) How to build carbondata-1.6.0 with spark-2.1.1

wupeng created CARBONDATA-3549:
----------------------------------

Summary: How to build carbondata-1.6.0 with spark-2.1.1
Key: CARBONDATA-3549
URL: https://issues.apache.org/jira/browse/CARBONDATA-3549
Project: CarbonData
Issue Type: Improvement
Reporter: wupeng

I'm using building carbondata-1.6.0-rc3 with spark-2.1.1, and I found errors as follow:
{code:java}
[ERROR] /carbondata-root-1.6.0/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CreateCarbonSourceTableAsSelectCommand.scala:153: error: scrutinee is incompatible with pattern type;[ERROR] /carbondata-root-1.6.0/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CreateCarbonSourceTableAsSelectCommand.scala:153: error: scrutinee is incompatible with pattern type;[INFO] found : org.apache.spark.sql.execution.datasources.HadoopFsRelation[INFO] required: Unit[INFO] case fs:HadoopFsRelation if table.partitionColumnNames.nonEmpty &&[INFO] ^[WARNING] three warnings found
{code}
Finally I found the problem, In spark-2.1.1, org.apache.spark.sql.execution.datasources.Datasource#write has no return result, which in spark-2.1.0 has a BaseRelation as return.

spark-2.1.0:

{code:java}
/** Writes the given [[DataFrame]] out to this [[DataSource]]. */
def write(
mode: SaveMode,
data: DataFrame): BaseRelation = {
if (data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) {
throw new AnalysisException("Cannot save interval data type into external storage.")
}
{code}
spark-2.1.1

{code:java}
/**
* Writes the given [[DataFrame]] out to this [[DataSource]].
*/
def write(mode: SaveMode, data: DataFrame): Unit = {
if (data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) {
throw new AnalysisException("Cannot save interval data type into external storage.")
}
{code}

so when we build carbondata with spark-2.1.1, this method will give Exception in this code, because result is Unit in spark-2.1.1.

{code:java}
val result = try {
// dataSource.write(mode, df)
dataSource.writeAndRead(mode, df)
} catch {
case ex: AnalysisException =>
logError(s"Failed to write to table $tableName in $mode mode", ex)
throw ex
}
result match {
case fs: HadoopFsRelation if table.partitionColumnNames.nonEmpty &&
sparkSession.sqlContext.conf.manageFilesourcePartitions =>
// Need to recover partitions into the metastore so our saved data is visible.
sparkSession.sessionState.executePlan(
AlterTableRecoverPartitionsCommand(table.identifier)).toRdd
case _ =>
}
{code}
I checked this method DataSource#write in spark-2.1.1 found it has been replaced by writeAndRead.
So I have to modify org.apache.spark.sql.hive.CreateCarbonSourceTableAsSelectCommand on line 146, change dataSource.write(mode, df) to dataSource.writeAndRead(mode, df)
After that the problem was resolved.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)