[jira] [Created] (CARBONDATA-1569) Spark 2.1.0 + Carbondata 1.1.1 integrate failed when we init carbonSession but not provide dfs.nameservices in storePath

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (CARBONDATA-1569) Spark 2.1.0 + Carbondata 1.1.1 integrate failed when we init carbonSession but not provide dfs.nameservices in storePath

Akash R Nilugal (Jira)
wyp created CARBONDATA-1569:
-------------------------------

             Summary: Spark 2.1.0 + Carbondata 1.1.1 integrate failed when we init carbonSession but not provide dfs.nameservices in storePath
                 Key: CARBONDATA-1569
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1569
             Project: CarbonData
          Issue Type: Bug
          Components: sql
    Affects Versions: 1.1.1
            Reporter: wyp
            Priority: Minor


When I init carbonSession but not provide {{dfs.nameservices}} in storePath, it will throw an exception. The following is the code snippet:
{code}
scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession

scala> import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.CarbonSession._

scala> val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs:///user/wyp/carbon")
carbon: org.apache.spark.sql.SparkSession = org.apache.spark.sql.CarbonSession@3fe33c59

scala> carbon.sql("select * from temp.test_table").show(10)
java.io.IOException: java.lang.Exception: Invalid tuple id _table/Fact/0/0/0-0_batchno0-0-1507707171354
  at org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:332)
  at org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:262)
  at org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:81)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:311)
  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
  at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
  at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2113)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2112)
  at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2795)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)
  at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:636)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:595)
  ... 50 elided
Caused by: java.lang.Exception: Invalid tuple id _table/Fact/0/0/0-0_batchno0-0-1507707171354
  at org.apache.carbondata.core.statusmanager.SegmentUpdateStatusManager.getDeltaFiles(SegmentUpdateStatusManager.java:318)
  at org.apache.carbondata.core.statusmanager.SegmentUpdateStatusManager.getDeleteDeltaFilePath(SegmentUpdateStatusManager.java:281)
  at org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:330)
  ... 81 more
{code}

The code will work properly in Spark 2.1.0 + Carbondata 1.1.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)