Hi,
I am trying to run example on Carbon data guide. https://carbondata.apache.org/quick-start-guide.html Run it through spark-shell on local mode. Start command: /opt/spark2.3.2/bin/spark-shell --jars apache-carbondata-1.5.1-bin-spark2.3.2-hadoop2.7.2.jar --master local Code: val store = "hdfs:///user/spark2/data/carbondata/store" val meta = "hdfs:///user/spar2/data/carbondata" import org.apache.spark.sql.CarbonSession._ val carbon = SparkSession.builder().appName("CarbonSessionExample").getOrCreateCarbonSession(store, meta) val df = carbon.read.option("header", true).csv("hdfs:///user/spark2/carbon_test.csv") df.write.format("carbondata").option("tableName", "carbon_test_t0").option("compress", "true").mode(SaveMode.Overwrite).save() carbon_test.csv data like this: id,name,city,age a,xx,cc,1 b,xxx,ccc,2 c,xxxxx,ccc,3 d,xxx,fd,4 REPL print save result: 2018-12-07 15:58:40 AUDIT audit:72 - {"time":"December 7, 2018 3:58:40 PM CST","username":"spark2","opName":"CREATE TABLE","opId":"254101956103089","opStatus":"START"} 2018-12-07 15:58:41 WARN HiveExternalCatalog:66 - Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting data source table `default`.`carbon_test_t0` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. 2018-12-07 15:58:41 AUDIT audit:93 - {"time":"December 7, 2018 3:58:41 PM CST","username":"spark2","opName":"CREATE TABLE","opId":"254101956103089","opStatus":"SUCCESS","opTime":"359 ms","table":"default.carbon_test_t0","extraInfo":{"bad_record_path":"","streaming":"false","local_dictionary_enable":"true","external":"false","sort_columns":"id,name,city,age","comment":""}} 2018-12-07 15:58:41 AUDIT audit:72 - {"time":"December 7, 2018 3:58:41 PM CST","username":"spark2","opName":"LOAD DATA OVERWRITE","opId":"254102325557552","opStatus":"START"} 2018-12-07 15:58:41 WARN UnsafeIntermediateMerger:88 - the configure spill size is 0 less than the page size 67108864,so no merge and spill in-memory pages to disk 2018-12-07 15:58:42 WARN CarbonDataProcessorUtil:93 - dir already exists, skip dir creation: /home/spark2/app/tmp/carbon254102981358020_0/Fact/Part0/Segment_0/0 2018-12-07 15:58:43 AUDIT audit:93 - {"time":"December 7, 2018 3:58:43 PM CST","username":"spark2","opName":"LOAD DATA OVERWRITE","opId":"254102325557552","opStatus":"SUCCESS","opTime":"1972 ms","table":"default.carbon_test_t0","extraInfo":{"SegmentId":"0","DataSize":"1.19KB","IndexSize":"674.0B"}} SHOW SEGMENTS FOR TABLE carbon_test_t0 Result: 2018-12-07 16:01:18 AUDIT audit:72 - {"time":"December 7, 2018 4:01:18 PM CST","username":"spark2","opName":"SHOW SEGMENTS","opId":"254259762484865","opStatus":"START"} 2018-12-07 16:01:18 AUDIT audit:93 - {"time":"December 7, 2018 4:01:18 PM CST","username":"spark2","opName":"SHOW SEGMENTS","opId":"254259762484865","opStatus":"SUCCESS","opTime":"57 ms","table":"default.carbon_test_t0","extraInfo":{}} +-----------------+-------+--------------------+--------------------+---------+-----------+---------+----------+ |SegmentSequenceId| Status| Load Start Time| Load End Time|Merged To|File Format|Data Size|Index Size| +-----------------+-------+--------------------+--------------------+---------+-----------+---------+----------+ | 0|Success|2018-12-07 15:58:...|2018-12-07 15:58:...| NA|COLUMNAR_V3| 1.19KB| 674.0B| +-----------------+-------+--------------------+--------------------+---------+-----------+---------+----------+ But i run query "select * from carbon_test_t0" Result: +---+----+----+---+ | id|name|city|age| +---+----+----+---+ +---+----+----+---+ I am sure data insert successfully, because new segment can find when running "insert into" or "save overwrite" command. Thanks, Odone -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |