HI:
i clone from git ,branch master , compile mvn hadoop 2.6.3 ,spark 1.6.1 follow the quick start , then run spark-shell $SPARK_HOME/bin/spark-shell --verbose --master local[4] --jars /usr/local/spark/lib/carbondata_2.10-0.3.0-incubating-SNAPSHOT-shade-hadoop2.6.3.jar,/usr/local/spark/lib/mysql-connector-java-5.1.38-bin.jar more detail can see :http://pastebin.com/Myp6aubs then :paste import java.io._ import org.apache.hadoop.hive.conf.HiveConf import org.apache.spark.sql.CarbonContext val storePath = "hdfs://test.namenode02.bi.com:8020/usr/carbondata/store" val cc = new CarbonContext(sc, storePath) cc.setConf(HiveConf.ConfVars.HIVECHECKFILEFORMAT.varname, "false") cc.setConf("carbon.kettle.home","/usr/local/spark/carbondata/carbonplugins") cc.sql("create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'") cc.sql(s"load data inpath 'hdfs://test.namenode02.bi.com:8020/tmp/sample.csv' into table test_table") cc.sql("select * from test_table").show 1: can't load ,but can create table Table MetaData Unlocked Successfully after data load > java.lang.RuntimeException: Table is locked for updation. Please try > after some time like this http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/load-data-fail-td100.html#a164 i chmod 777, then can run it but when i run this ,cc.sql("select * from test_table").show NFO 30-11 18:24:01,072 - Parse Completed INFO 30-11 18:24:01,196 - main Starting to optimize plan INFO 30-11 18:24:01,347 - main ************************Total Number Rows In BTREE: 1 INFO 30-11 18:24:01,361 - main ************************Total Number Rows In BTREE: 1 INFO 30-11 18:24:01,369 - main ************************Total Number Rows In BTREE: 1 INFO 30-11 18:24:01,376 - main ************************Total Number Rows In BTREE: 1 INFO 30-11 18:24:01,385 - main ************************Total Number Rows In BTREE: 1 INFO 30-11 18:24:01,386 - main Total Time taken to ensure the required executors: 0 INFO 30-11 18:24:01,386 - main Time elapsed to allocate the required executors: 0 INFO 30-11 18:24:01,391 - Identified no.of.blocks: 5, no.of.tasks: 4, no.of.nodes: 1, parallelism: 4 INFO 30-11 18:24:01,396 - Starting job: show at <console>:37 INFO 30-11 18:24:01,396 - Got job 3 (show at <console>:37) with 1 output partitions INFO 30-11 18:24:01,396 - Final stage: ResultStage 4 (show at <console>:37) INFO 30-11 18:24:01,396 - Parents of final stage: List() INFO 30-11 18:24:01,397 - Missing parents: List() INFO 30-11 18:24:01,397 - Submitting ResultStage 4 (MapPartitionsRDD[20] at show at <console>:37), which has no missing parents INFO 30-11 18:24:01,401 - Block broadcast_6 stored as values in memory (estimated size 13.3 KB, free 285.6 KB) INFO 30-11 18:24:01,403 - Block broadcast_6_piece0 stored as bytes in memory (estimated size 6.7 KB, free 292.2 KB) INFO 30-11 18:24:01,403 - Added broadcast_6_piece0 in memory on localhost:15792 (size: 6.7 KB, free: 511.1 MB) INFO 30-11 18:24:01,404 - Created broadcast 6 from broadcast at DAGScheduler.scala:1006 INFO 30-11 18:24:01,404 - Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[20] at show at <console>:37) INFO 30-11 18:24:01,404 - Adding task set 4.0 with 1 tasks INFO 30-11 18:24:01,405 - Starting task 0.0 in stage 4.0 (TID 6, localhost, partition 0,PROCESS_LOCAL, 2709 bytes) INFO 30-11 18:24:01,406 - Running task 0.0 in stage 4.0 (TID 6) INFO 30-11 18:24:01,436 - [Executor task launch worker-1][partitionID:table;queryID:10219962900397098] Query will be executed on table: test_table ERROR 30-11 18:24:01,444 - Exception in task 0.0 in stage 4.0 (TID 6) java.lang.InterruptedException: at org.apache.carbondata.hadoop.CarbonRecordReader.initialize(CarbonRecordReader.java:83) at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:171) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) |
I have integrated CarbonData to project StreamingPro (
http://www.jianshu.com/p/7733da82a9ce). StreamingPro ships with carbondata-0.3.0 without kettle dependency and support spark streaming. Maybe you can have a try,hope this can save your time. |
Free forum by Nabble | Edit this page |