Apache CarbonData Dev Mailing List archive

carbondata load all stored data to tmp dir

Classic

List

Threaded

5 messages Options

dylan

Mar 12, 2018; 12:02pm

carbondata load all stored data to tmp dir

11 posts

hi guys:
i am use carbondata1.3 and spark2.2.1 on standalone,
i start the CarbonThriftServer like this:
/bin/spark-submit
--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
$SPARK_HOME/carbonlib/carbondata_2.11-1.3.0-shade-hadoop2.7.2.jar
hdfs://nameservice1/hive/carbon/store
i get this log:Downloading hdfs://nameservice1/hive/carbon/store to
/tmp/tmp6465512979544197326/hive/carbon/store.
this will download all carbonstore to tmp dir,If my carbonstore is very
large, this will take a lot of boot time, and my temporary directory will be
full; each time you start it will create a new temporary directory.
Was it designed in this way, or was my configuration wrong?

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

BabuLal

Mar 13, 2018; 10:31am

Re: carbondata load all stored data to tmp dir

15 posts

Hi dylan

I have verified your scenario in my setup and it is working fine without
downloading store to local /tmp/location . Below command is used to started
Thriftserver & Carbon Store is NOT getting copied to /tmp location .

bin/spark-submit --class
org.apache.carbondata.spark.thriftserver.CarbonThriftServer
/opt/sparkrelease/spark-2.2.1-bin-hadoop2.7/carbonlib/carbondata_2.11-1.3.0-shade-hadoop2.7.2.jar
hdfs://master:9000/carbonstore

Can you please provide below detail to analyze issue further.

1. spark-default.conf under <SPARK-HOME>/conf
2. driver logs ( console log when starting thriftserver)

Thanks
Babu

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

dylan

Mar 13, 2018; 11:14am

Re: carbondata load all stored data to tmp dir

11 posts

hello babulal:
thanks for your reply.
1.my spark-default.conf is:
spark.executor.extraJavaOptions
-Dcarbon.properties.filepath=/home/spark-2.2.1-bin-hadoop2.7/conf/carbon.properties
spark.driver.extraJavaOptions
-Dcarbon.properties.filepath=/home/spark-2.2.1-bin-hadoop2.7/conf/carbon.properties
2.console log
18/03/13 19:12:51 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
18/03/13 19:12:51 WARN DomainSocketFactory: The short-circuit local reads
feature cannot be used because libhadoop cannot be loaded.
Downloading hdfs://nameservice1/hive/carbon/store to
/tmp/tmp3188425816613265318/hive/carbon/store.
The download operation will continue for a long time until it downloads
all the data to the tmp directory

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

BabuLal

Mar 13, 2018; 4:05pm

Re: carbondata load all stored data to tmp dir

15 posts

Hi dylan
As per your console log , this error comes when wrong command to start
spark-submit while providing resources(jars/file) . i tried below command
and got same error like you ( i has given --jars option and at last store
location with space ).

root@master /opt/sparkrelease/spark-2.2.1-bin-hadoop2.7 # bin/spark-submit
--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer --jars
$SPARK_HOME/carbonlib/carbondata_2.11-1.3.0-shade-hadoop2.7.2.jar
hdfs://master:9000/carbonstore
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
Downloading hdfs://master:9000/carbonstore to
/tmp/tmp1358150251291982356/carbonstore.
Exception in thread "main" java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868)
at
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)

Spark has below code to do this for resource localization

..\core\src\main\scala\org\apache\spark\deploy\SparkSubmit.scala
#downloadFile

<http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t203/downloadFileMethod.png>

private[deploy] def downloadFile(path: String, hadoopConf:
HadoopConfiguration): String = {
require(path != null, "path cannot be null.")
val uri = Utils.resolveURI(path)
uri.getScheme match {
case "file" | "local" =>
path

case _ =>
val fs = FileSystem.get(uri, hadoopConf)
val tmpFile = new File(Files.createTempDirectory("tmp").toFile,
uri.getPath)
// scalastyle:off println
printStream.println(s"Downloading ${uri.toString} to
${tmpFile.getAbsolutePath}.")
// scalastyle:on println
fs.copyToLocalFile(new Path(uri), new Path(tmpFile.getAbsolutePath))
Utils.resolveURI(tmpFile.getAbsolutePath).toString
}
}

And this method is called only in below case

if (deployMode == CLIENT) {
val hadoopConf = conf.getOrElse(new HadoopConfiguration())
localPrimaryResource =
Option(args.primaryResource).map(downloadFile(_, hadoopConf)).orNull
localJars = Option(args.jars).map(downloadFileList(_,
hadoopConf)).orNull
localPyFiles = Option(args.pyFiles).map(downloadFileList(_,
hadoopConf)).orNull
localFiles = Option(args.files).map(downloadFileList(_,
hadoopConf)).orNull
}

<http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t203/callingMethods.jpg>

So Please check your command to start CarbonThriftServer Or send me the
exact command.

Thanks
Babu

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

dylan

Mar 14, 2018; 1:15am

Re: carbondata load all stored data to tmp dir

11 posts

hello babulal:
I know the problem,i use the wrong command to start spark-submit
with --jars.
Thank you very much for your answer and solved my problem.

thanks!

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/