http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/FileNotFoundExceptions-while-running-CarbonData-tp18340.html
Hello
I am new to carbon data and we are trying to use carbon data in production. I built and installed it on Spark edge nodes as per given instruction -
Build - No major issues.
Infrastructure - Spark 2.1.0 on MapR cluster.
carbon.properties changes -
carbon.storelocation=/tmp/hacluster/Opt/CarbonStore
carbon.badRecords.location=/opt/Carbon/Spark/badrecords
carbon.lock.type=HDFSLOCK
spark-default.conf changes -
spark.yarn.dist.files /opt/mapr/spark/spark-2.1.0/conf/carbon.properties
spark.yarn.dist.archives /opt/mapr/spark/spark-2.1.0/carbonlib/carbondata.tar.gz
spark.executor.extraJavaOptions -Dcarbon.properties.filepath=carbon.properties
spark.driver.extraJavaOptions -Dcarbon.properties.filepath=/opt/mapr/spark/spark-2.1.0/conf/carbon.properties
Command line -
/opt/mapr/spark/spark-2.1.0/bin/spark-shell --name "My app" --master yarn --jars /opt/mapr/spark/spark-2.1.0/carbonlib/carbondata_2.11-1.1.0-shade-hadoop2.2.0.jar \
--driver-memory 1g \
--executor-cores 2 \
--executor-memory 2G
Code snippet -
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
carbon.sql("""CREATE TABLE
IF NOT EXISTS test_table(
id string,
name string,
city string,
age Int)
STORED BY 'carbondata'""")
INTO TABLE test_table""")
First error -
Inital error was "Dictionay file is locked for updation". Further debugging showed that it was due to missing maprFS filesyste. (HDFSFileLock.java line # 52) -
String hdfsPath = "conf.get(CarbonCommonConstants.FS_DEFAULT_FS)"
I added some code to workaround with path like maprfs:///* and that seemed to be working fine. (like adding MAPRFS FileType)
Second error -
First error was gone after mapFS refactoring but then it fails with below error. It seems *.dict & *.dictmeta are not getting created. Could you please help me resolving this error?
Thanks
Swapnil