loading data from parquet table always

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

loading data from parquet table always

喜之郎
hi dev.
  I have a parquet table  and a carbon table. This table have 1 billion rows.
parquet table :
============
CREATE TABLE mc_idx3(
COL_1 integer,
COL_2  integer,
COL_3  string,
COL_4  integer,
COL_5  string,
COL_6  string,
COL_7   string,
COL_8   string,
COL_9   integer,
COL_10 long,
COL_11 string,
COL_12 string,
COL_13 string,
COL_14 string,
COL_15 integer,
COL_16 string,
COL_17 Timestamp )
STORED AS PARQUET;

==============
carbon table:
===============
CREATE TABLE mc_idxok_cd1(
COL_1 integer,
COL_2  integer,
COL_3  string,
COL_4  integer,
COL_5  string,
COL_6  string,
COL_7   string,
COL_8   string,
COL_9   integer,
COL_10 long,
COL_11 string,
COL_12 string,
COL_13 string,
COL_14 string,
COL_15 integer,
COL_16 string,
COL_17 Timestamp )
STORED BY 'carbondata'
TBLPROPERTIES (
'SORT_COLUMNS'='COL_17,COL_1');

=============
when I using insert into table mc_idxok_cd1 select * from mc_idx3.
It always failed.
ERROR LOG:
org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: There is an unexpected error: org.apache.carbondata.core.datastore.exception.CarbonDataWriterException: Problem while copying file from local store to carbon store
        at org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:123)
        at org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:390)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.internalCompute(NewCarbonDataLoadRDD.scala:353)
        at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:60)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: org.apache.carbondata.core.datastore.exception.CarbonDataWriterException: Problem while copying file from local store to carbon store
        at org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.processingComplete(DataWriterProcessorStepImpl.java:162)
        at org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.finish(DataWriterProcessorStepImpl.java:148)
        at org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:112)





-------------------
can anybody give me some advice? Any advice is appreciated!
Reply | Threaded
Open this post in threaded view
|

回复:loading data from parquet table always

喜之郎
component version:


carbondata version:1.3.1
spark:2.2.1




------------------ 原始邮件 ------------------
发件人: "251922566"<[hidden email]>;
发送时间: 2018年5月7日(星期一) 上午10:41
收件人: "dev"<[hidden email]>;

主题: loading data from parquet table always



hi dev.
  I have a parquet table  and a carbon table. This table have 1 billion rows.
parquet table :
============
CREATE TABLE mc_idx3(
COL_1 integer,
COL_2  integer,
COL_3  string,
COL_4  integer,
COL_5  string,
COL_6  string,
COL_7   string,
COL_8   string,
COL_9   integer,
COL_10 long,
COL_11 string,
COL_12 string,
COL_13 string,
COL_14 string,
COL_15 integer,
COL_16 string,
COL_17 Timestamp )
STORED AS PARQUET;

==============
carbon table:
===============
CREATE TABLE mc_idxok_cd1(
COL_1 integer,
COL_2  integer,
COL_3  string,
COL_4  integer,
COL_5  string,
COL_6  string,
COL_7   string,
COL_8   string,
COL_9   integer,
COL_10 long,
COL_11 string,
COL_12 string,
COL_13 string,
COL_14 string,
COL_15 integer,
COL_16 string,
COL_17 Timestamp )
STORED BY 'carbondata'
TBLPROPERTIES (
'SORT_COLUMNS'='COL_17,COL_1');

=============
when I using insert into table mc_idxok_cd1 select * from mc_idx3.
It always failed.
ERROR LOG:
org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: There is an unexpected error: org.apache.carbondata.core.datastore.exception.CarbonDataWriterException: Problem while copying file from local store to carbon store
        at org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:123)
        at org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:390)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.internalCompute(NewCarbonDataLoadRDD.scala:353)
        at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:60)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: org.apache.carbondata.core.datastore.exception.CarbonDataWriterException: Problem while copying file from local store to carbon store
        at org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.processingComplete(DataWriterProcessorStepImpl.java:162)
        at org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.finish(DataWriterProcessorStepImpl.java:148)
        at org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:112)





-------------------
can anybody give me some advice? Any advice is appreciated!
Reply | Threaded
Open this post in threaded view
|

Re: loading data from parquet table always

akashrn5
In reply to this post by 喜之郎
Hi,

The exception says, there is problem while copying from local to
carbonstore(HDFS). It means the writing has already finished in the temp
folder and after writing
it will copy the files to hdfs and it is failing during that time.

So with this exception trace, it will be difficult to know the root cause
for the failure, failure can happen because of HDFS also. So you can check
two things

1. Check whether the space is available in HDFS or not
2. When this exception came, check what is the exception in hdfs logs.

May be with that you can get some idea.


There is one property called

*carbon.load.directWriteHdfs.enabled*

By default, this property will be false, and if you make it true, the files
will be directly written to carbonstore(hdfs),  instead of writing first in
local and then copying.
You can check by setting this property whether the load is successful or
not.


Regards,
Akash R Nilugal



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: loading data from parquet table always

akashrn5
In reply to this post by 喜之郎
Hi,

The exception says, there is problem while copying from local to
carbonstore(HDFS). It means the writing has already finished in the temp
folder and after writing
it will copy the files to hdfs and it is failing during that time.

So with this exception trace, it will be difficult to know the root cause
for the failure, failure can happen because of HDFS also. So you can check
two things

1. Check whether the space is available in HDFS or not
2. When this exception came, check what is the exception in hdfs logs.

May be with that you can get some idea.


There is one property called

*carbon.load.directWriteHdfs.enabled*

By default, this property will be false, and if you make it true, the files
will be directly written to carbonstore(hdfs),  instead of writing first in
local and then copying.
You can check by setting this property whether the load is successful or
not.


Regards,
Akash R Nilugal



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/