Hi:
I encounter 'java.lang.NegativeArraySizeException' error with carbondata 1.3.1 + spark 2.2. When I run the compact command to compact 8 level-1 segments to a level-2 segment, the 'java.lang.NegativeArraySizeException' error occurred: *java.lang.NegativeArraySizeException at org.apache.carbondata.core.datastore.chunk.store.impl.unsafe.UnsafeVariableLengthDimesionDataChunkStore.getRow(UnsafeVariableLengthDimesionDataChunkStore.java:172) at org.apache.carbondata.core.datastore.chunk.impl.AbstractDimensionDataChunk.getChunkData(AbstractDimensionDataChunk.java:46) at org.apache.carbondata.core.scan.result.AbstractScannedResult.getNoDictionaryKeyArray(AbstractScannedResult.java:431) at org.apache.carbondata.core.scan.result.impl.NonFilterQueryScannedResult.getNoDictionaryKeyArray(NonFilterQueryScannedResult.java:67) at org.apache.carbondata.core.scan.collector.impl.RawBasedResultCollector.scanResultAndGetData(RawBasedResultCollector.java:83) at org.apache.carbondata.core.scan.collector.impl.RawBasedResultCollector.collectData(RawBasedResultCollector.java:58) at org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.next(DataBlockIteratorImpl.java:51) at org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.next(DataBlockIteratorImpl.java:32) at org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.getBatchResult(DetailQueryResultIterator.java:49) at org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:41) at org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:31) at org.apache.carbondata.core.scan.result.iterator.RawResultIterator.hasNext(RawResultIterator.java:72) at org.apache.carbondata.processing.merger.RowResultMergerProcessor.execute(RowResultMergerProcessor.java:131) at org.apache.carbondata.spark.rdd.CarbonMergerRDD$$anon$1.<init>(CarbonMergerRDD.scala:228) at org.apache.carbondata.spark.rdd.CarbonMergerRDD.internalCompute(CarbonMergerRDD.scala:84) at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:60) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)* I traced the code of 'UnsafeVariableLengthDimesionDataChunkStore.getRow', found that the root cause is the value of length is negative when create byte array: 'byte[] data = new byte[length];', the value of some parameters are below when error ocurred: when 'rowId < numberOfRows - 1': *this.dataLength=192000 currentDataOffset=2 rowId=0 OffsetOfNextdata=-12173 (why) length=-12177* otherwise : *this.dataLength=320000 currentDataOffset=263702 rowId=31999 length=-9238* the value of (320000 - 263702) is exceed the range of short. I patch the PR#2796(https://github.com/apache/carbondata/pull/2796), but error still occurred. finally, my test steps are: for example: there are 4 level-1 compacted segments: 1.1, 2.1, 3.1, 4.1: *1. run compact command, it failed; 2. delete 1.1 segment, run compact command again, it failed; 3. delete 2.1 segment, run compact command again, it failed; 3. delete 3.1 segment, run compact command again, it succeeded;* So I think that one of 8 level-1 compacted segments maybe have some problem but I don't how to find out. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi :
It seems that MemoryBlock is cleaned by some other thread. i will investigate this ,you can continue by setting up below parameter in carbon.properties. enable.unsafe.in.query.processing=false enable.unsafe.columnpage=false Thanks Babu -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
This post was updated on .
Hi Babu:
Thanks for your reply. I set enable.unsafe.in.query.processing=false and enable.unsafe.columnpage=false , and test failed still. As the test steps I mentioned above, I copy the wrong segment and use SDKReader to read data, it failed too, the error message is following: *java.lang.RuntimeException: java.lang.IllegalArgumentException at org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk.convertToDimColDataChunkWithOutCache(DimensionRawColumnChunk.java:120) at org.apache.carbondata.core.scan.result.BlockletScannedResult.fillDataChunks(BlockletScannedResult.java:355) at org.apache.carbondata.core.scan.result.BlockletScannedResult.hasNext(BlockletScannedResult.java:559) at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedResultCollector.collectResultInRow(DictionaryBasedResultCollector.java:137) at org.apache.carbondata.core.scan.processor.DataBlockIterator.next(DataBlockIterator.java:109) at org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.getBatchResult(DetailQueryResultIterator.java:49) at org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:41) at org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:1) at org.apache.carbondata.core.scan.result.iterator.ChunkRowIterator.hasNext(ChunkRowIterator.java:58) at org.apache.carbondata.hadoop.CarbonRecordReader.nextKeyValue(CarbonRecordReader.java:104) at org.apache.carbondata.sdk.file.CarbonReader.hasNext(CarbonReader.java:71) at cn.xm.zzc.carbonsdktest.CarbonSDKTest.main(CarbonSDKTest.java:68) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:244) at org.apache.carbondata.core.datastore.chunk.store.impl.unsafe.UnsafeVariableLengthDimensionDataChunkStore.putArray(UnsafeVariableLengthDimensionDataChunkStore.java:97) at org.apache.carbondata.core.datastore.chunk.impl.VariableLengthDimensionColumnPage.<init>(VariableLengthDimensionColumnPage.java:58) at org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.CompressedDimensionChunkFileBasedReaderV3.decodeDimensionLegacy(CompressedDimensionChunkFileBasedReaderV3.java:325) at org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.CompressedDimensionChunkFileBasedReaderV3.decodeDimension(CompressedDimensionChunkFileBasedReaderV3.java:266) at org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.CompressedDimensionChunkFileBasedReaderV3.decodeColumnPage(CompressedDimensionChunkFileBasedReaderV3.java:224) at org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk.convertToDimColDataChunkWithOutCache(DimensionRawColumnChunk.java:118) ... 11 more* There are many error records, when error occurred, the values of some parameters in UnsafeVariableLengthDimensionDataChunkStore.putArray are as following : buffer.limit buffer.cap startOffset lastLength numberOfRows this.dataPointersOffsets 288000 288000 300289 24433 32000 288000 448000 448000 464551 24927 32000 448000 384000 384000 -32566 -32568 32000 384000 480000 480000 -20257 -20259 32000 480000 96000 96000 96166 304 32000 96000 515278 515278 -12047 -12049 32000 515278 305953 305953 -8148 -8150 32000 305953 field 'lastLength' means the length get from data before error occurred. startOffset is bigger than buffer.limit, so error occurred. I think maybe the data are written wrongly when generated this segment because of the issue you fixed: MemoryBlock is cleaned by some other thread and result in wrong data when write data, is it possible? -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi,
I have a few question regarding this Exception: 1. Does the table have a string columns for which the length of the data is exceeding 32k characters? 2. Are you able to query(select *) on the table successfully? 3. Can you share the schema of the table? Meanwhile i am looking into the possibilities of any other thread clearing the MemoryBlock. Regards Kunal Kapoor On Tue, Oct 16, 2018 at 1:31 PM xm_zzc <[hidden email]> wrote: > Hi Babu: > Thanks for your reply. > I set enable.unsafe.in.query.processing=false and > enable.unsafe.columnpage=false , and test failed still. > I think the issue I met is not related to MemoryBlock which is cleaned by > some other thread. As the test steps I mentioned above, I copy the wrong > segment and use SDKReader to read data, it failed too, the error message > is > following: > *java.lang.RuntimeException: java.lang.IllegalArgumentException > at > > org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk.convertToDimColDataChunkWithOutCache(DimensionRawColumnChunk.java:120) > at > > org.apache.carbondata.core.scan.result.BlockletScannedResult.fillDataChunks(BlockletScannedResult.java:355) > at > > org.apache.carbondata.core.scan.result.BlockletScannedResult.hasNext(BlockletScannedResult.java:559) > at > > org.apache.carbondata.core.scan.collector.impl.DictionaryBasedResultCollector.collectResultInRow(DictionaryBasedResultCollector.java:137) > at > > org.apache.carbondata.core.scan.processor.DataBlockIterator.next(DataBlockIterator.java:109) > at > > org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.getBatchResult(DetailQueryResultIterator.java:49) > at > > org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:41) > at > > org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:1) > at > > org.apache.carbondata.core.scan.result.iterator.ChunkRowIterator.hasNext(ChunkRowIterator.java:58) > at > > org.apache.carbondata.hadoop.CarbonRecordReader.nextKeyValue(CarbonRecordReader.java:104) > at > org.apache.carbondata.sdk.file.CarbonReader.hasNext(CarbonReader.java:71) > at > cn.xm.zzc.carbonsdktest.CarbonSDKTest.main(CarbonSDKTest.java:68) > Caused by: java.lang.IllegalArgumentException > at java.nio.Buffer.position(Buffer.java:244) > at > > org.apache.carbondata.core.datastore.chunk.store.impl.unsafe.UnsafeVariableLengthDimensionDataChunkStore.putArray(UnsafeVariableLengthDimensionDataChunkStore.java:97) > at > > org.apache.carbondata.core.datastore.chunk.impl.VariableLengthDimensionColumnPage.<init>(VariableLengthDimensionColumnPage.java:58) > at > > org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.CompressedDimensionChunkFileBasedReaderV3.decodeDimensionLegacy(CompressedDimensionChunkFileBasedReaderV3.java:325) > at > > org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.CompressedDimensionChunkFileBasedReaderV3.decodeDimension(CompressedDimensionChunkFileBasedReaderV3.java:266) > at > > org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.CompressedDimensionChunkFileBasedReaderV3.decodeColumnPage(CompressedDimensionChunkFileBasedReaderV3.java:224) > at > > org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk.convertToDimColDataChunkWithOutCache(DimensionRawColumnChunk.java:118) > ... 11 more* > > when error occurred, the values of some parameters in > UnsafeVariableLengthDimensionDataChunkStore.putArray are as following : > > buffer.limit=192000 > buffer.cap=192000 > startOffset=300289 > numberOfRows=32000 > this.dataPointersOffsets=288000 > > startOffset is bigger than buffer.limit, so error occurred. > > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hi Kunal Kapoor:
1. No; 2. query unsuccessfully, I use Carbon SDK Reader to read that wrong segment and it failed too. 3. the schema of the table: | rt | string | timestamp_1min | bigint | timestamp_5min | bigint | timestamp_1hour | bigint | customer_id | bigint | transport_id | bigint | transport_code | string | tcp_udp | int | pre_hdt_id | string | hdt_id | string | status | int | is_end_user | int | transport_type | string | transport_type_nam | string | fcip | string | host | string | cip | string | code | int | conn_status | int | recv | bigint | send | bigint | msec | bigint | dst_prefix | string | next_type | int | next | string | hdt_sid | string | from_endpoint_type | int | to_endpoint_type | int | fcip_view | string | fcip_country | string | fcip_province | string | fcip_city | string | fcip_longitude | string | fcip_latitude | string | fcip_node_name | string | fcip_node_name_cn | string | host_view | string | host_country | string | host_province | string | host_city | string | host_longitude | string | host_latitude | string | cip_view | string | cip_country | string | cip_province | string | cip_city | string | cip_longitude | string | cip_latitude | string | cip_node_name | string | cip_node_name_cn | string | dtp_send | string | client_port | int | server_ip | string | server_port | int | state | string | response_code | int | access_domain | string | valid | int | min_batch_time | bigint | update_time | bigint | | | ##Detailed Table Information | | Database Name | hdt_sys | Table Name | transport_access_log | CARBON Store Path | hdfs://hdtcluster/carbon_store | Comment | | Table Block Size | 512 MB | Table Data Size | 777031634135 | Table Index Size | 72894232 | Last Update Time | 1539769299990 | SORT_SCOPE | local_sort | Streaming | true | MAJOR_COMPACTION_SIZE | 4096 | AUTO_LOAD_MERGE | true | COMPACTION_LEVEL_THRESHOLD | 2,8 | | | ##Detailed Column property | | ADAPTIVE | | SORT_COLUMNS | is_end_user,status,customer_id,access_domain,transport_id,timestamp_1hour,timestamp_1min,conn_status,pre_hdt_id,tcp_udp,transport_code,fcip,cip | I think maybe the data are written wrongly when generated segment because of the issue : MemoryBlock is cleaned by some other thread and result in wrong data when write data. Now I deleted the wrong segment and then patch the PR#2796, and continue to run stream app. If the exception no longer occur, it proves that PR#2796 works. Right? -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Right.
Can you try to write the segments after cherry picking #2796? On Wed, Oct 17, 2018 at 3:21 PM xm_zzc <[hidden email]> wrote: > Hi Kunal Kapoor: > 1. No; > 2. query unsuccessfully, I use Carbon SDK Reader to read that wrong > segment and it failed too. > 3. the schema of the table: > > | rt | string > > | timestamp_1min | bigint > > | timestamp_5min | bigint > > | timestamp_1hour | bigint > > | customer_id | bigint > > | transport_id | bigint > > | transport_code | string > > | tcp_udp | int > > | pre_hdt_id | string > > | hdt_id | string > > | status | int > > | is_end_user | int > > | transport_type | string > > | transport_type_nam | string > > | fcip | string > > | host | string > > | cip | string > > | code | int > > | conn_status | int > > | recv | bigint > > | send | bigint > > | msec | bigint > > | dst_prefix | string > > | next_type | int > > | next | string > > | hdt_sid | string > > | from_endpoint_type | int > > | to_endpoint_type | int > > | fcip_view | string > > | fcip_country | string > > | fcip_province | string > > | fcip_city | string > > | fcip_longitude | string > > | fcip_latitude | string > > | fcip_node_name | string > > | fcip_node_name_cn | string > > | host_view | string > > | host_country | string > > | host_province | string > > | host_city | string > > | host_longitude | string > > | host_latitude | string > > | cip_view | string > > | cip_country | string > > | cip_province | string > > | cip_city | string > > | cip_longitude | string > > | cip_latitude | string > > | cip_node_name | string > > | cip_node_name_cn | string > > | dtp_send | string > > | client_port | int > > | server_ip | string > > | server_port | int > > | state | string > > | response_code | int > > | access_domain | string > > | valid | int > > | min_batch_time | bigint > > | update_time | bigint > > | | > > | ##Detailed Table Information | > > | Database Name | hdt_sys > > | Table Name | transport_access_log > > | CARBON Store Path | hdfs://hdtcluster/carbon_store > > | Comment | > > | Table Block Size | 512 MB > > | Table Data Size | 777031634135 > > | Table Index Size | 72894232 > > | Last Update Time | 1539769299990 > > | SORT_SCOPE | local_sort > > | Streaming | true > > | MAJOR_COMPACTION_SIZE | 4096 > > | AUTO_LOAD_MERGE | true > > | COMPACTION_LEVEL_THRESHOLD | 2,8 > > | | > > | ##Detailed Column property | > > | ADAPTIVE | > > | SORT_COLUMNS | > is_end_user,status,customer_id,access_domain,transport_id,timestamp_1hour,timestamp_1min,conn_status,pre_hdt_id,tcp_udp,transport_code,fcip,cip > > | > > > I think maybe the data are written wrongly when generated segment because > of > the issue : MemoryBlock is cleaned by some other thread and result in > wrong data when write data. > Now I deleted the wrong segment and then patch the PR#2796, and continue to > run stream app. If the exception no longer occur, it proves that PR#2796 > works. Right? > > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hi Kunal Kapoor:
I have patched PR#2796 into 1.3.1 and run stream app again, this issue does not happen often, I will run for a few days to check whether it works. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Okay sure. Please let us know the result
On Thu, Oct 18, 2018, 11:01 AM xm_zzc <[hidden email]> wrote: > Hi Kunal Kapoor: > I have patched PR#2796 into 1.3.1 and run stream app again, this issue > does not happen often, I will run for a few days to check whether it works. > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hi Kunal Kapoor, Babu :
My stream app has ran for few days, the issue no longer occur, PR#2796 works. Thanks. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |