Why metadata path didn't show up on my local disk

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Why metadata path didn't show up on my local disk

Kuai Yu
Hi Carbondata experts,

I'm new to Spark, also to Carbondata.

I'm trying to leverage Carbondata to store some key-value pairs on HDFS. To
start with, I issued a few commands on Spark shell to help me better
understand the behavior.

Here is how I launched spark shell:
=========================
spark-shell --spark-version 2.3.0 spark.hive.support=true --driver-memory
2G --num-executors 50 --executor-cores 2 --executor-memory 2G --jars
apache-carbondata-1.5.2-bin-spark2.3.2-hadoop2.7.2.jar

Here is how i issued the commands:
===================================
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val carbon =
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("
hdfs://<hostname>:9000/user/kuyu/carbondata3", "/export/home/kuyu/wenye")

val schema = StructType(Array(
      StructField("keyCol", StringType, false),
      StructField("deltaCol", LongType, false),
      StructField("__opalSegmentId", IntegerType, false),
      StructField("__opalSegmentOffset", IntegerType, false)))

val keyStoreDF = carbon.read.format("csv").option("header",
"true").schema(schema).load("hdfs://<hostname>:9000/user/kuyu/keystore.csv")

val carbonDFWriter = new CarbonDataFrameWriter(carbon.sqlContext,
keyStoreDF)
val options = Map("tableName" -> "wenye_xyz")
carbonDFWriter.saveAsCarbonFile(options)

What I found:
====================
'Fact', 'LockFiles', 'Metadata' are created under
hdfs://<hostname>:9000/user/kuyu/carbondata3/wenye_xyz. However I couldn't
find /export/home/kuyu/wenye was created anywhere. I saw Carbon used  derby
DB by default, which should create the /export/home/kuyu/wenye on local
disk. Is my understanding correct?

Thanks,
KY
Reply | Threaded
Open this post in threaded view
|

Re: Why metadata path didn't show up on my local disk

David CaiQiang
Maybe it used javax.jdo.option.ConnectionURL configuration.
When hive,hadoop and spark don't set this configuration, it will use the
parameter of getOrCreateCarbonSession.



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai