carbondata 加载数据问题咨询

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

carbondata 加载数据问题咨询

刘feng
您好,

   最近研究carbondata,在加载数据时遇到几个问题:

1,load 数据量超过10G,在collect at GlobalDictionaryUtil.scala:746
<http://namenode1:8088/proxy/application_1505443499883_0001/stages/stage?id=
4&attempt=0> 报错,导致无法进行

2,5G以内数据,往新建的表中insert时,一两分钟就可以成功,但是按照增量的方式
insert时会很慢,大约三十分钟。

以上,请问有什么优化的办法吗?谢谢!!!

配置:集群三个 数据节点,配置 128G内存 8核CPU,10块硬盘。

 

----------------------------------------------------------------------------
---------------------------

刘峰

Mobile:13889865456

 



---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this communication in error,please
immediately notify the sender by return e-mail, and delete the original message and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------
Reply | Threaded
Open this post in threaded view
|

Re: carbondata 加载数据问题咨询

cenyuhai
Hi, fengliu:
  please use english. Describe your steps as detailed as possible, error message is also needed.




------------------ Original ------------------
From:  "刘feng";<[hidden email]>;
Date:  Fri, Sep 15, 2017 11:42 AM
To:  "dev"<[hidden email]>;

Subject:  carbondata 加载数据问题咨询



您好,

   最近研究carbondata,在加载数据时遇到几个问题:

1,load 数据量超过10G,在collect at GlobalDictionaryUtil.scala:746
<http://namenode1:8088/proxy/application_1505443499883_0001/stages/stage?id=
4&attempt=0> 报错,导致无法进行

2,5G以内数据,往新建的表中insert时,一两分钟就可以成功,但是按照增量的方式
insert时会很慢,大约三十分钟。

以上,请问有什么优化的办法吗?谢谢!!!

配置:集群三个 数据节点,配置 128G内存 8核CPU,10块硬盘。

 

----------------------------------------------------------------------------
---------------------------

刘峰

Mobile:13889865456

 



---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this communication in error,please
immediately notify the sender by return e-mail, and delete the original message and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------
Reply | Threaded
Open this post in threaded view
|

Re: carbondata 加载数据问题咨询

Liang Chen
Administrator
Hi

I have the same comments as cenyuhai, please provide more detail info, which
version you used?

Please refer to
https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md,
for high cardinality columns, you can use  script like TBLPROPERTIES
('DICTIONARY_EXCLUDE'='MSISDN') , not create dictionary.

Regards
Liang



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/