Apache CarbonData Dev Mailing List archive

carbondata 加载数据问题咨询

Classic

List

Threaded

3 messages Options

刘feng

carbondata 加载数据问题咨询

您好，

最近研究carbondata，在加载数据时遇到几个问题：

1，load 数据量超过10G，在collect at GlobalDictionaryUtil.scala:746
<http://namenode1:8088/proxy/application_1505443499883_0001/stages/stage?id=
4&attempt=0> 报错，导致无法进行

2，5G以内数据，往新建的表中insert时，一两分钟就可以成功，但是按照增量的方式
insert时会很慢，大约三十分钟。

以上，请问有什么优化的办法吗？谢谢！！！

配置：集群三个数据节点，配置 128G内存 8核CPU，10块硬盘。

----------------------------------------------------------------------------
---------------------------

刘峰

Mobile：13889865456

---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this communication in error,please
immediately notify the sender by return e-mail, and delete the original message and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------

cenyuhai

Re: carbondata 加载数据问题咨询

Hi, fengliu:
please use english. Describe your steps as detailed as possible, error message is also needed.

------------------ Original ------------------
From: "刘feng";<[hidden email]>;
Date: Fri, Sep 15, 2017 11:42 AM
To: "dev"<[hidden email]>;

Subject: carbondata 加载数据问题咨询

您好，

最近研究carbondata，在加载数据时遇到几个问题：

1，load 数据量超过10G，在collect at GlobalDictionaryUtil.scala:746
<http://namenode1:8088/proxy/application_1505443499883_0001/stages/stage?id=
4&attempt=0> 报错，导致无法进行

2，5G以内数据，往新建的表中insert时，一两分钟就可以成功，但是按照增量的方式
insert时会很慢，大约三十分钟。

以上，请问有什么优化的办法吗？谢谢！！！

配置：集群三个数据节点，配置 128G内存 8核CPU，10块硬盘。

----------------------------------------------------------------------------
---------------------------

刘峰

Mobile：13889865456

---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this communication in error,please
immediately notify the sender by return e-mail, and delete the original message and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------

Liang Chen

Re: carbondata 加载数据问题咨询

Administrator

Hi

I have the same comments as cenyuhai, please provide more detail info, which
version you used?

Please refer to
https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md,
for high cardinality columns, you can use script like TBLPROPERTIES
('DICTIONARY_EXCLUDE'='MSISDN') , not create dictionary.

Regards
Liang

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/