您好,
最近研究carbondata,在加载数据时遇到几个问题: 1,load 数据量超过10G,在collect at GlobalDictionaryUtil.scala:746 <http://namenode1:8088/proxy/application_1505443499883_0001/stages/stage?id= 4&attempt=0> 报错,导致无法进行 2,5G以内数据,往新建的表中insert时,一两分钟就可以成功,但是按照增量的方式 insert时会很慢,大约三十分钟。 以上,请问有什么优化的办法吗?谢谢!!! 配置:集群三个 数据节点,配置 128G内存 8核CPU,10块硬盘。 ---------------------------------------------------------------------------- --------------------------- 刘峰 Mobile:13889865456 --------------------------------------------------------------------------------------------------- Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) is intended only for the use of the intended recipient and may be confidential and/or privileged of Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying is strictly prohibited, and may be unlawful.If you have received this communication in error,please immediately notify the sender by return e-mail, and delete the original message and all copies from your system. Thank you. --------------------------------------------------------------------------------------------------- |
Hi, fengliu:
please use english. Describe your steps as detailed as possible, error message is also needed. ------------------ Original ------------------ From: "刘feng";<[hidden email]>; Date: Fri, Sep 15, 2017 11:42 AM To: "dev"<[hidden email]>; Subject: carbondata 加载数据问题咨询 您好, 最近研究carbondata,在加载数据时遇到几个问题: 1,load 数据量超过10G,在collect at GlobalDictionaryUtil.scala:746 <http://namenode1:8088/proxy/application_1505443499883_0001/stages/stage?id= 4&attempt=0> 报错,导致无法进行 2,5G以内数据,往新建的表中insert时,一两分钟就可以成功,但是按照增量的方式 insert时会很慢,大约三十分钟。 以上,请问有什么优化的办法吗?谢谢!!! 配置:集群三个 数据节点,配置 128G内存 8核CPU,10块硬盘。 ---------------------------------------------------------------------------- --------------------------- 刘峰 Mobile:13889865456 --------------------------------------------------------------------------------------------------- Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) is intended only for the use of the intended recipient and may be confidential and/or privileged of Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying is strictly prohibited, and may be unlawful.If you have received this communication in error,please immediately notify the sender by return e-mail, and delete the original message and all copies from your system. Thank you. --------------------------------------------------------------------------------------------------- |
Administrator
|
Hi
I have the same comments as cenyuhai, please provide more detail info, which version you used? Please refer to https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md, for high cardinality columns, you can use script like TBLPROPERTIES ('DICTIONARY_EXCLUDE'='MSISDN') , not create dictionary. Regards Liang -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |