carbondata loading

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

carbondata loading

lionel061201
Hi dev team,
I'm loading data from parquet file to carbondata file(DF read parquet and
save to csv then load into carbondata file). The job is blocked at "collect
at CarbonDataRDDFactory.scala:963"



*Job Id*

*Description*

*Submitted*

*Duration*

*Stages: Succeeded/Total*

*Tasks (for all stages): Succeeded/Total*

6

collect at CarbonDataRDDFactory.scala:963
<http://10.129.96.13:8088/proxy/application_1479961381214_0612/jobs/job?id=6>

2016/12/01 13:56:43

3.1 h

0/1

0/2
Completed Jobs (6)

*Job Id*

*Description*

*Submitted*

*Duration*

*Stages: Succeeded/Total*

*Tasks (for all stages): Succeeded/Total*

5

collect at GlobalDictionaryUtil.scala:800
<http://10.129.96.13:8088/proxy/application_1479961381214_0612/jobs/job?id=5>

2016/12/01 13:34:25

22 min

2/2

422/422

4

take at CarbonCsvRelation.scala:181
<http://10.129.96.13:8088/proxy/application_1479961381214_0612/jobs/job?id=4>

2016/12/01 13:34:25

0.1 s

1/1

1/1

3

saveAsTextFile at package.scala:169
<http://10.129.96.13:8088/proxy/application_1479961381214_0612/jobs/job?id=3>

2016/12/01 13:11:02

23 min

1/1

50/50

2

count at SaicSparkConvert.scala:40
<http://10.129.96.13:8088/proxy/application_1479961381214_0612/jobs/job?id=2>

2016/12/01 13:10:31

31 s

2/2

51/51

1

parquet at SaicSparkConvert.scala:35
<http://10.129.96.13:8088/proxy/application_1479961381214_0612/jobs/job?id=1>

2016/12/01 13:10:28

1 s

1/1

2/2

0

parquet at SaicSparkConvert.scala:35
<http://10.129.96.13:8088/proxy/application_1479961381214_0612/jobs/job?id=0>

2016/12/01 13:10:26

2 s

1/1

2/2


I looked into the stdout, the log are all the same warning.


WARN  01-12 13:56:46,096 - [pool-25-thread-5][partitionID:carbontest]
Cannot convert : null to Numeric type value. Value considered as null.

WARN  01-12 13:56:46,096 - [pool-25-thread-4][partitionID:carbontest]
Cannot convert : null to Numeric type value. Value considered as null.

WARN  01-12 13:56:46,096 - [pool-25-thread-1][partitionID:carbontest]
Cannot convert : null to Numeric type value. Value considered as null.

WARN  01-12 13:56:46,096 - [pool-25-thread-2][partitionID:carbontest]
Cannot convert : null to Numeric type value. Value considered as null.

WARN  01-12 13:56:46,096 - [pool-25-thread-6][partitionID:carbontest]
Cannot convert : null to Numeric type value. Value considered as null.

WARN  01-12 13:56:46,096 - [pool-25-thread-2][partitionID:carbontest]
Cannot convert : null to Numeric type value. Value considered as null.

WARN  01-12 13:56:46,096 - [pool-25-thread-1][partitionID:carbontest]
Cannot convert : null to Numeric type value. Value considered as null.


My configuration is

--master yarn-custer

--driver-memory 8g

--executor-memory 120g

--num-executors 3


Any idea for this? Is it caused by data type?


Thanks,

Lionel