Apache CarbonData Dev Mailing List archive - Re: ［Carbondata-0.2.0-incubating][Issue Report] -- Select statement return error when add String column in where clause

Apache CarbonData Dev Mailing List archive

Re: ［Carbondata-0.2.0-incubating][Issue Report] -- Select statement return error when add String column in where clause

Posted by lionel061201 on Dec 14, 2016; 2:18am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Carbondata-0-2-0-incubating-Issue-Report-Select-statement-return-error-when-add-String-column-in-whee-tp4379p4380.html

Hi,
I just uploaded the data file to Baidu:
链接: https://pan.baidu.com/s/1slERWL3
密码: m7kj

Thanks,
Lionel

On Wed, Dec 14, 2016 at 10:12 AM, Lu Cao <[hidden email]> wrote:

> Hi Dev team,
> As discussed this afternoon, I've changed back to 0.2.0 version for the
> testing. Please ignore the former email about "error when save DF to
> carbondata file", that's on master branch.
>
> Spark version: 1.6.0
> System: Mac OS X EI Capitan(10.11.6)
>
> [lucao]$ spark-shell --master local[*] --total-executor-cores 2
> --executor-memory 1g --num-executors 2 --jars ~/MyDev/hive-1.1.1/lib/mysql-c
> onnector-java-5.1.40-bin.jar
>
> In 0.2.0, I can successfully create table and load data into carbondata
> table
>
> scala> cc.sql("create table if not exists default.mycarbon_00001(vin
> String, data_date String, work_model Double) stored by 'carbondata'")
>
> scala> cc.sql("load data inpath'test2.csv' into table
> default.mycarbon_00001")
>
> I can successfully run below query:
>
> scala> cc.sql("select vin, count(*) from default.mycarbon_00001 group
> by vin").show
>
> INFO 13-12 17:13:42,215 - Job 5 finished: show at <console>:42, took
> 0.732793 s
>
> +-----------------+---+
>
> | vin|_c1|
>
> +-----------------+---+
>
> |LSJW26760ES065247|464|
>
> |LSJW26760GS018559|135|
>
> |LSJW26761ES064611|104|
>
> |LSJW26761FS090787| 45|
>
> |LSJW26762ES051513| 40|
>
> |LSJW26762FS075036|434|
>
> |LSJW26763ES052363| 32|
>
> |LSJW26763FS088491|305|
>
> |LSJW26764ES064859|186|
>
> |LSJW26764FS078696| 40|
>
> |LSJW26765ES058651|171|
>
> |LSJW26765FS072633|191|
>
> |LSJW26765GS056837|467|
>
> |LSJW26766FS070308| 79|
>
> |LSJW26766GS050853|300|
>
> |LSJW26767FS069913| 8|
>
> |LSJW26767GS053454|286|
>
> |LSJW26768FS062811| 16|
>
> |LSJW26768GS051146| 97|
>
> |LSJW26769FS062722|424|
>
> +-----------------+---+
>
> only showing top 20 rows
>
> The error occurred when I add "vin" column into where clause:
>
> scala> cc.sql("select vin, count(*) from default.mycarbon_00001 where
> vin='LSJW26760ES065247' group by vin")
>
> +-----------------+---+
>
> | vin|_c1|
>
> +-----------------+---+
>
> |LSJW26760ES065247|464|
>
> +-----------------+---+
>
> >>> This one is OK... Actually as I tested, the *first two value* in the
> top 20 rows usually successed but for most of others it will return error.
>
> For example :
>
> scala> cc.sql("select vin, count(*) from default.mycarbon_00001 where
> vin='LSJW26765GS056837' group by vin").show
>
> >>>Log is coming:
>
> <carbontest_lucao_20161213.log>
>
>
> It is the same error I met at Dec. 6th. As I said in the WeChat Group
> before:
>
> When the data set is 1000 rows, no above error occurred.
>
> When the data set is 1M rows, some returned error, some didn't.
>
> When the data set is 1.9 billion, all tests returned error.
>
>
> *### Attached the sample data set (1M rows) for your reference.*
>
> <<........I sent this email yesterday afternoon but it was rejected by
> apache mail server due to larger than 1000000 bytes, so remove the sample
> data file from attachment, if you need it please reply your personal email
> address........>>
>
> Looking forward to your response.
>
>
> Thanks & Best Regards,
>
> Lionel
>