Login  Register

Re: error occur when I load data to s3

Posted by kunalkapoor on Sep 05, 2018; 10:14am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/error-occur-when-I-load-data-to-s3-tp60717p61440.html

Hi Aaron,
I tried running similar commands from my environment, Load data command was
successful.

From analysing the logs the exception seems to be coming while lock file
creation.
Can you try the same scenario by configuring the `carbon.lock.path`
property in carbon.properties to any HDFS location:

*example:*
carbon.lock.path=hdfs://hacluster/mylockFiles

Thanks
Kunal Kapoor

On Tue, Sep 4, 2018 at 12:17 PM aaron <[hidden email]> wrote:

> Hi kunalkapoor, I'd like give you more debug log as below.
>
>
> application/x-www-form-urlencoded; charset=utf-8
> Tue, 04 Sep 2018 06:45:10 GMT
> /aa-sdk-test2/carbon-data/example/LockFiles/concurrentload.lock"
> 18/09/04 14:45:10 DEBUG request: Sending Request: GET
> https://aa-sdk-test2.s3.us-east-1.amazonaws.com
> /carbon-data/example/LockFiles/concurrentload.lock Headers: (Authorization:
> AWS AKIAIAQX5F5B2MLQPRGQ:Ap8rHsiPQPYUdcBb2Ojb/MA9q+I=, User-Agent:
> aws-sdk-java/1.7.4 Mac_OS_X/10.13.6
> Java_HotSpot(TM)_64-Bit_Server_VM/25.144-b01/1.8.0_144, Range: bytes=0--1,
> Date: Tue, 04 Sep 2018 06:45:10 GMT, Content-Type:
> application/x-www-form-urlencoded; charset=utf-8, )
> 18/09/04 14:45:10 DEBUG PoolingClientConnectionManager: Connection request:
> [route: {s}->https://aa-sdk-test2.s3.us-east-1.amazonaws.com:443][total
> kept
> alive: 1; route allocated: 1 of 15; total allocated: 1 of 15]
> 18/09/04 14:45:10 DEBUG PoolingClientConnectionManager: Connection leased:
> [id: 1][route:
> {s}->https://aa-sdk-test2.s3.us-east-1.amazonaws.com:443][total kept
> alive:
> 0; route allocated: 1 of 15; total allocated: 1 of 15]
> 18/09/04 14:45:10 DEBUG SdkHttpClient: Stale connection check
> 18/09/04 14:45:10 DEBUG RequestAddCookies: CookieSpec selected: default
> 18/09/04 14:45:10 DEBUG RequestAuthCache: Auth cache not set in the context
> 18/09/04 14:45:10 DEBUG RequestProxyAuthentication: Proxy auth state:
> UNCHALLENGED
> 18/09/04 14:45:10 DEBUG SdkHttpClient: Attempt 1 to execute request
> 18/09/04 14:45:10 DEBUG DefaultClientConnection: Sending request: GET
> /carbon-data/example/LockFiles/concurrentload.lock HTTP/1.1
> 18/09/04 14:45:10 DEBUG wire:  >> "GET
> /carbon-data/example/LockFiles/concurrentload.lock HTTP/1.1[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "Host:
> aa-sdk-test2.s3.us-east-1.amazonaws.com[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "Authorization: AWS
> AKIAIAQX5F5B2MLQPRGQ:Ap8rHsiPQPYUdcBb2Ojb/MA9q+I=[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "User-Agent: aws-sdk-java/1.7.4
> Mac_OS_X/10.13.6
> Java_HotSpot(TM)_64-Bit_Server_VM/25.144-b01/1.8.0_144[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "Range: bytes=0--1[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "Date: Tue, 04 Sep 2018 06:45:10
> GMT[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "Content-Type:
> application/x-www-form-urlencoded; charset=utf-8[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "Connection: Keep-Alive[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "[\r][\n]"
> 18/09/04 14:45:10 DEBUG headers: >> GET
> /carbon-data/example/LockFiles/concurrentload.lock HTTP/1.1
> 18/09/04 14:45:10 DEBUG headers: >> Host:
> aa-sdk-test2.s3.us-east-1.amazonaws.com
> 18/09/04 14:45:10 DEBUG headers: >> Authorization: AWS
> AKIAIAQX5F5B2MLQPRGQ:Ap8rHsiPQPYUdcBb2Ojb/MA9q+I=
> 18/09/04 14:45:10 DEBUG headers: >> User-Agent: aws-sdk-java/1.7.4
> Mac_OS_X/10.13.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.144-b01/1.8.0_144
> 18/09/04 14:45:10 DEBUG headers: >> Range: bytes=0--1
> 18/09/04 14:45:10 DEBUG headers: >> Date: Tue, 04 Sep 2018 06:45:10 GMT
> 18/09/04 14:45:10 DEBUG headers: >> Content-Type:
> application/x-www-form-urlencoded; charset=utf-8
> 18/09/04 14:45:10 DEBUG headers: >> Connection: Keep-Alive
> 18/09/04 14:45:10 DEBUG wire:  << "HTTP/1.1 200 OK[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "x-amz-id-2:
>
> ooaOvIUsvupOOYOCVRY7y4TUanV9xJbcAqfd+w31xAkGRptm1blE5E5yMobmKsmRyGj9crhGCao=[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "x-amz-request-id:
> A1AD0240EBDD2234[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "Date: Tue, 04 Sep 2018 06:45:11
> GMT[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "Last-Modified: Tue, 04 Sep 2018 06:45:05
> GMT[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "ETag:
> "d41d8cd98f00b204e9800998ecf8427e"[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "Accept-Ranges: bytes[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "Content-Type:
> application/octet-stream[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "Content-Length: 0[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "Server: AmazonS3[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "[\r][\n]"
> 18/09/04 14:45:10 DEBUG DefaultClientConnection: Receiving response:
> HTTP/1.1 200 OK
> 18/09/04 14:45:10 DEBUG headers: << HTTP/1.1 200 OK
> 18/09/04 14:45:10 DEBUG headers: << x-amz-id-2:
>
> ooaOvIUsvupOOYOCVRY7y4TUanV9xJbcAqfd+w31xAkGRptm1blE5E5yMobmKsmRyGj9crhGCao=
> 18/09/04 14:45:10 DEBUG headers: << x-amz-request-id: A1AD0240EBDD2234
> 18/09/04 14:45:10 DEBUG headers: << Date: Tue, 04 Sep 2018 06:45:11 GMT
> 18/09/04 14:45:10 DEBUG headers: << Last-Modified: Tue, 04 Sep 2018
> 06:45:05
> GMT
> 18/09/04 14:45:10 DEBUG headers: << ETag:
> "d41d8cd98f00b204e9800998ecf8427e"
> 18/09/04 14:45:10 DEBUG headers: << Accept-Ranges: bytes
> 18/09/04 14:45:10 DEBUG headers: << Content-Type: application/octet-stream
> 18/09/04 14:45:10 DEBUG headers: << Content-Length: 0
> 18/09/04 14:45:10 DEBUG headers: << Server: AmazonS3
> 18/09/04 14:45:10 DEBUG SdkHttpClient: Connection can be kept alive
> indefinitely
> 18/09/04 14:45:10 DEBUG request: Received successful response: 200, AWS
> Request ID: A1AD0240EBDD2234
> 18/09/04 14:45:10 DEBUG PoolingClientConnectionManager: Connection [id:
> 1][route: {s}->https://aa-sdk-test2.s3.us-east-1.amazonaws.com:443] can be
> kept alive indefinitely
> 18/09/04 14:45:10 DEBUG PoolingClientConnectionManager: Connection
> released:
> [id: 1][route:
> {s}->https://aa-sdk-test2.s3.us-east-1.amazonaws.com:443][total kept
> alive:
> 1; route allocated: 1 of 15; total allocated: 1 of 15]
> 18/09/04 14:45:10 DEBUG S3AFileSystem: OutputStream for key
> 'carbon-data/example/LockFiles/concurrentload.lock' writing to tempfile:
> /tmp/hadoop-aaron/s3a/output-8508205130207286174.tmp
> 18/09/04 14:45:10 ERROR CarbonLoadDataCommand: main
> java.lang.ArrayIndexOutOfBoundsException
>         at
> java.io.BufferedOutputStream.write(BufferedOutputStream.java:128)
>         at
> org.apache.hadoop.fs.s3a.S3AOutputStream.write(S3AOutputStream.java:164)
>         at
>
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>         at
>
> org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.getDataOutputStream(S3CarbonFile.java:111)
>         at
>
> org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.getDataOutputStreamUsingAppend(S3CarbonFile.java:93)
>         at
>
> org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStreamUsingAppend(FileFactory.java:289)
>         at
> org.apache.carbondata.core.locks.S3FileLock.lock(S3FileLock.java:96)
>         at
>
> org.apache.carbondata.core.locks.AbstractCarbonLock.lockWithRetries(AbstractCarbonLock.java:41)
>         at
>
> org.apache.carbondata.core.locks.AbstractCarbonLock.lockWithRetries(AbstractCarbonLock.java:59)
>         at
>
> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.acquireConcurrentLoadLock(CarbonLoadDataCommand.scala:399)
>         at
>
> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:259)
>         at
>
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:92)
>         at
>
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>         at
>
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>         at
>
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>         at org.apache.spark.sql.Dataset.<init>(Dataset.scala:183)
>         at
>
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:106)
>         at
>
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:95)
>         at
> org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:153)
>         at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:93)
>         at
> org.apache.carbondata.examples.S3Example$.main(S3Example.scala:91)
>         at org.apache.carbondata.examples.S3Example.main(S3Example.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at
>
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
>         at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>         at
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 18/09/04 14:45:10 AUDIT CarbonLoadDataCommand:
> [aaron.lan.appannie.com][aaron][Thread-1]Dataload failure for
> default.carbon_table. Please check the logs
> 18/09/04 14:45:10 DEBUG Client: The ping interval is 60000 ms.
> 18/09/04 14:45:10 DEBUG Client: Connecting to localhost/127.0.0.1:9000
> 18/09/04 14:45:10 DEBUG Client: IPC Client (777046609) connection to
> localhost/127.0.0.1:9000 from aaron: starting, having connections 1
> 18/09/04 14:45:10 DEBUG Client: IPC Client (777046609) connection to
> localhost/127.0.0.1:9000 from aaron sending #3
> 18/09/04 14:45:10 DEBUG Client: IPC Client (777046609) connection to
> localhost/127.0.0.1:9000 from aaron got value #3
> 18/09/04 14:45:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took 6ms
> 18/09/04 14:45:10 DEBUG AbstractDFSCarbonFile: main Exception occurred:File
> does not exist:
> hdfs://localhost:9000/usr/carbon-meta/partition/default/carbon_table
> 18/09/04 14:45:10 DEBUG Client: IPC Client (777046609) connection to
> localhost/127.0.0.1:9000 from aaron sending #4
> 18/09/04 14:45:10 DEBUG Client: IPC Client (777046609) connection to
> localhost/127.0.0.1:9000 from aaron got value #4
> 18/09/04 14:45:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took 3ms
> 18/09/04 14:45:10 ERROR CarbonLoadDataCommand: main Got exception
> java.lang.ArrayIndexOutOfBoundsException when processing data. But this
> command does not support undo yet, skipping the undo part.
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
>         at
> java.io.BufferedOutputStream.write(BufferedOutputStream.java:128)
>         at
> org.apache.hadoop.fs.s3a.S3AOutputStream.write(S3AOutputStream.java:164)
>         at
>
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>         at
>
> org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.getDataOutputStream(S3CarbonFile.java:111)
>         at
>
> org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.getDataOutputStreamUsingAppend(S3CarbonFile.java:93)
>         at
>
> org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStreamUsingAppend(FileFactory.java:289)
>         at
> org.apache.carbondata.core.locks.S3FileLock.lock(S3FileLock.java:96)
>         at
>
> org.apache.carbondata.core.locks.AbstractCarbonLock.lockWithRetries(AbstractCarbonLock.java:41)
>         at
>
> org.apache.carbondata.core.locks.AbstractCarbonLock.lockWithRetries(AbstractCarbonLock.java:59)
>         at
>
> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.acquireConcurrentLoadLock(CarbonLoadDataCommand.scala:399)
>         at
>
> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:259)
>         at
>
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:92)
>         at
>
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>         at
>
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>         at
>
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>         at org.apache.spark.sql.Dataset.<init>(Dataset.scala:183)
>         at
>
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:106)
>         at
>
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:95)
>         at
> org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:153)
>         at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:93)
>         at
> org.apache.carbondata.examples.S3Example$.main(S3Example.scala:91)
>         at org.apache.carbondata.examples.S3Example.main(S3Example.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at
>
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
>         at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>         at
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 18/09/04 14:45:10 INFO SparkContext: Invoking stop() from shutdown hook
> 18/09/04 14:45:10 INFO SparkUI: Stopped Spark web UI at
> http://localhost:4040
> 18/09/04 14:45:10 INFO MapOutputTrackerMasterEndpoint:
> MapOutputTrackerMasterEndpoint stopped!
> 18/09/04 14:45:10 INFO MemoryStore: MemoryStore cleared
> 18/09/04 14:45:10 INFO BlockManager: BlockManager stopped
> 18/09/04 14:45:10 INFO BlockManagerMaster: BlockManagerMaster stopped
> 18/09/04 14:45:10 INFO
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
> OutputCommitCoordinator stopped!
> 18/09/04 14:45:10 INFO SparkContext: Successfully stopped SparkContext
> 18/09/04 14:45:10 INFO ShutdownHookManager: Shutdown hook called
> 18/09/04 14:45:10 INFO ShutdownHookManager: Deleting directory
>
> /private/var/folders/dd/n9pmb1nj0dncx5rd_s2rm9_40000gn/T/spark-f1e5dab8-a7db-4107-a3bf-c7253ba7ac06
> 18/09/04 14:45:10 DEBUG IdleConnectionReaper: Reaper thread:
> java.lang.InterruptedException: sleep interrupted
>         at java.lang.Thread.sleep(Native Method)
>         at
> com.amazonaws.http.IdleConnectionReaper.run(IdleConnectionReaper.java:112)
> 18/09/04 14:45:10 DEBUG IdleConnectionReaper: Shutting down reaper thread.
> 18/09/04 14:45:10 DEBUG PoolingClientConnectionManager: Connection manager
> is shutting down
> 18/09/04 14:45:10 DEBUG DefaultClientConnection: Connection
> 0.0.0.0:59398<->54.231.82.12:443 closed
> 18/09/04 14:45:10 DEBUG DefaultClientConnection: Connection
> 0.0.0.0:59398<->54.231.82.12:443 closed
> 18/09/04 14:45:10 DEBUG PoolingClientConnectionManager: Connection manager
> shut down
> 18/09/04 14:45:10 DEBUG Client: stopping client from cache:
> org.apache.hadoop.ipc.Client@18ab86a2
> 18/09/04 14:45:10 DEBUG Client: removing client from cache:
> org.apache.hadoop.ipc.Client@18ab86a2
> 18/09/04 14:45:10 DEBUG Client: stopping actual client because no more
> references remain: org.apache.hadoop.ipc.Client@18ab86a2
> 18/09/04 14:45:10 DEBUG Client: Stopping client
> 18/09/04 14:45:10 DEBUG Client: IPC Client (777046609) connection to
> localhost/127.0.0.1:9000 from aaron: closed
> 18/09/04 14:45:10 DEBUG Client: IPC Client (777046609) connection to
> localhost/127.0.0.1:9000 from aaron: stopped, remaining connections 0
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>