[lucao]$ spark-shell --master local[*] --total-executor-cores 2 --executor-memory 1g --num-executors 2 --jars ~/MyDev/hive-1.1.1/lib/mysql-connector-java-5.1.40-bin.jar
In 0.2.0, I can successfully create table and load data into carbondata table
scala> cc.sql("create table if not exists default.mycarbon_00001(vin String, data_date String, work_model Double) stored by 'carbondata'")
scala> cc.sql("load data inpath'test2.csv' into table default.mycarbon_00001")
I can successfully run below query:
scala> cc.sql("select vin, count(*) from default.mycarbon_00001 group by vin").show
INFO 13-12 17:13:42,215 - Job 5 finished: show at <console>:42, took 0.732793 s
+-----------------+---+
| vin|_c1|
+-----------------+---+
|LSJW26760ES065247|464|
|LSJW26760GS018559|135|
|LSJW26761ES064611|104|
|LSJW26761FS090787| 45|
|LSJW26762ES051513| 40|
|LSJW26762FS075036|434|
|LSJW26763ES052363| 32|
|LSJW26763FS088491|305|
|LSJW26764ES064859|186|
|LSJW26764FS078696| 40|
|LSJW26765ES058651|171|
|LSJW26765FS072633|191|
|LSJW26765GS056837|467|
|LSJW26766FS070308| 79|
|LSJW26766GS050853|300|
|LSJW26767FS069913| 8|
|LSJW26767GS053454|286|
|LSJW26768FS062811| 16|
|LSJW26768GS051146| 97|
|LSJW26769FS062722|424|
+-----------------+---+
only showing top 20 rows
The error occurred when I add "vin" column into where clause:
scala> cc.sql("select vin, count(*) from default.mycarbon_00001 where vin='LSJW26760ES065247' group by vin")
+-----------------+---+
| vin|_c1|
+-----------------+---+
|LSJW26760ES065247|464|
+-----------------+---+
>>> This one is OK... Actually as I tested, the first two value in the top 20 rows usually successed but for most of others it will return error.
For example :
scala> cc.sql("select vin, count(*) from default.mycarbon_00001 where vin='LSJW26765GS056837' group by vin").show
>>>Log is coming:
<carbontest_lucao_20161213.log>
It is the same error I met at Dec. 6th. As I said in the WeChat Group before:
When the data set is 1000 rows, no above error occurred.
When the data set is 1M rows, some returned error, some didn't.
When the data set is 1.9 billion, all tests returned error.
### Attached the sample data set (1M rows) for your reference.
<<........I sent this email yesterday afternoon but it was rejected by apache mail server due to larger than 1000000 bytes, so remove the sample data file from attachment, if you need it please reply your personal email address........>>
Looking forward to your response.
Thanks & Best Regards,
Lionel