Apache CarbonData Dev Mailing List archive

[Help] Carbondata use spark sql "select" query , but return empty dataset

Classic

List

Threaded

1 message

odone

[Help] Carbondata use spark sql "select" query , but return empty dataset

Hi,

I am trying to run example on Carbon data guide.
https://carbondata.apache.org/quick-start-guide.html

Run it through spark-shell on local mode.
Start command:
/opt/spark2.3.2/bin/spark-shell --jars
apache-carbondata-1.5.1-bin-spark2.3.2-hadoop2.7.2.jar --master local

Code:
val store = "hdfs:///user/spark2/data/carbondata/store"
val meta = "hdfs:///user/spar2/data/carbondata"

import org.apache.spark.sql.CarbonSession._
val carbon =
SparkSession.builder().appName("CarbonSessionExample").getOrCreateCarbonSession(store,
meta)

val df = carbon.read.option("header",
true).csv("hdfs:///user/spark2/carbon_test.csv")
df.write.format("carbondata").option("tableName",
"carbon_test_t0").option("compress", "true").mode(SaveMode.Overwrite).save()

carbon_test.csv data like this:

id,name,city,age
a,xx,cc,1
b,xxx,ccc,2
c,xxxxx,ccc,3
d,xxx,fd,4

REPL print save result:
2018-12-07 15:58:40 AUDIT audit:72 - {"time":"December 7, 2018 3:58:40 PM
CST","username":"spark2","opName":"CREATE
TABLE","opId":"254101956103089","opStatus":"START"}
2018-12-07 15:58:41 WARN HiveExternalCatalog:66 - Couldn't find
corresponding Hive SerDe for data source provider
org.apache.spark.sql.CarbonSource. Persisting data source table
`default`.`carbon_test_t0` into Hive metastore in Spark SQL specific format,
which is NOT compatible with Hive.
2018-12-07 15:58:41 AUDIT audit:93 - {"time":"December 7, 2018 3:58:41 PM
CST","username":"spark2","opName":"CREATE
TABLE","opId":"254101956103089","opStatus":"SUCCESS","opTime":"359
ms","table":"default.carbon_test_t0","extraInfo":{"bad_record_path":"","streaming":"false","local_dictionary_enable":"true","external":"false","sort_columns":"id,name,city,age","comment":""}}
2018-12-07 15:58:41 AUDIT audit:72 - {"time":"December 7, 2018 3:58:41 PM
CST","username":"spark2","opName":"LOAD DATA
OVERWRITE","opId":"254102325557552","opStatus":"START"}
2018-12-07 15:58:41 WARN UnsafeIntermediateMerger:88 - the configure spill
size is 0 less than the page size 67108864,so no merge and spill in-memory
pages to disk
2018-12-07 15:58:42 WARN CarbonDataProcessorUtil:93 - dir already exists,
skip dir creation:
/home/spark2/app/tmp/carbon254102981358020_0/Fact/Part0/Segment_0/0
2018-12-07 15:58:43 AUDIT audit:93 - {"time":"December 7, 2018 3:58:43 PM
CST","username":"spark2","opName":"LOAD DATA
OVERWRITE","opId":"254102325557552","opStatus":"SUCCESS","opTime":"1972
ms","table":"default.carbon_test_t0","extraInfo":{"SegmentId":"0","DataSize":"1.19KB","IndexSize":"674.0B"}}

SHOW SEGMENTS FOR TABLE carbon_test_t0 Result:
2018-12-07 16:01:18 AUDIT audit:72 - {"time":"December 7, 2018 4:01:18 PM
CST","username":"spark2","opName":"SHOW
SEGMENTS","opId":"254259762484865","opStatus":"START"}
2018-12-07 16:01:18 AUDIT audit:93 - {"time":"December 7, 2018 4:01:18 PM
CST","username":"spark2","opName":"SHOW
SEGMENTS","opId":"254259762484865","opStatus":"SUCCESS","opTime":"57
ms","table":"default.carbon_test_t0","extraInfo":{}}
+-----------------+-------+--------------------+--------------------+---------+-----------+---------+----------+
|SegmentSequenceId| Status| Load Start Time| Load End Time|Merged
To|File Format|Data Size|Index Size|
+-----------------+-------+--------------------+--------------------+---------+-----------+---------+----------+
| 0|Success|2018-12-07 15:58:...|2018-12-07 15:58:...|
NA|COLUMNAR_V3| 1.19KB| 674.0B|
+-----------------+-------+--------------------+--------------------+---------+-----------+---------+----------+

But i run query "select * from carbon_test_t0" Result:
+---+----+----+---+
| id|name|city|age|
+---+----+----+---+
+---+----+----+---+

I am sure data insert successfully, because new segment can find when
running "insert into" or "save overwrite" command.

Thanks,
Odone

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/