xubo245 created CARBONDATA-2385:
----------------------------------- Summary: The result is incorrect when read data from carbonfile generated by SDK Key: CARBONDATA-2385 URL: https://issues.apache.org/jira/browse/CARBONDATA-2385 Project: CarbonData Issue Type: Bug Reporter: xubo245 Assignee: xubo245 The result is incorrect when read data from carbonfile generated by SDK When generate 10 million rows data by org.apache.carbondata.spark.testsuite.createTable.TestCreateTableUsingSparkCarbonFileFormat and count is 5888000 {code:java} 18/04/23 01:43:12 INFO SessionState: Created HDFS directory: /tmp/hive/root/6ebdb24c-8b92-45c3-b7c0-639da93c2984/_tmp_space.db 18/04/23 01:43:12 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /huawei/xubo/git/carbondata1/integration/spark-common/target/warehouse 18/04/23 01:43:12 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint +----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+ |name |string |null | |age |int |null | |height |double |null | | | | | |# Detailed Table Information| | | |Database |default | | |Table |sdkoutputtable | | |Owner |root | | |Created |Mon Apr 23 01:43:19 PDT 2018 | | |Last Access |Wed Dec 31 16:00:00 PST 1969 | | |Type |EXTERNAL | | |Provider |carbonfile | | |Table Properties |[transient_lastDdlTime=1524472999] | | |Location |file:/huawei/xubo/git/carbondata1/integration/spark-common-test/src/test/resources/SparkCarbonFileFormat/WriterOutput/Fact/Part0/Segment_null| | |Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | |Storage Properties |[serialization.format=1] | | +----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+ +-------+---+------+ |name |age|height| +-------+---+------+ |robot0 |0 |0.0 | |robot1 |1 |0.5 | |robot2 |2 |1.0 | |robot3 |3 |1.5 | |robot4 |4 |2.0 | |robot5 |5 |2.5 | |robot6 |6 |3.0 | |robot7 |7 |3.5 | |robot8 |8 |4.0 | |robot9 |9 |4.5 | |robot10|10 |5.0 | |robot11|11 |5.5 | |robot12|12 |6.0 | |robot13|13 |6.5 | |robot14|14 |7.0 | |robot15|15 |7.5 | |robot16|16 |8.0 | |robot17|17 |8.5 | |robot18|18 |9.0 | |robot19|19 |9.5 | +-------+---+------+ only showing top 20 rows +------+---+------+ |name |age|height| +------+---+------+ |robot0|0 |0.0 | |robot1|1 |0.5 | |robot2|2 |1.0 | +------+---+------+ +-------+ |name | +-------+ |robot0 | |robot1 | |robot2 | |robot3 | |robot4 | |robot5 | |robot6 | |robot7 | |robot8 | |robot9 | |robot10| |robot11| |robot12| |robot13| |robot14| |robot15| |robot16| |robot17| |robot18| |robot19| +-------+ only showing top 20 rows +---+ |age| +---+ |0 | |1 | |2 | |3 | |4 | |5 | |6 | |7 | |8 | |9 | |10 | |11 | |12 | |13 | |14 | |15 | |16 | |17 | |18 | |19 | +---+ only showing top 20 rows +------+---+------+ |name |age|height| +------+---+------+ |robot3|3 |1.5 | |robot4|4 |2.0 | |robot5|5 |2.5 | |robot6|6 |3.0 | |robot7|7 |3.5 | +------+---+------+ +------+---+------+ |name |age|height| +------+---+------+ |robot3|3 |1.5 | +------+---+------+ +------+---+------+ |name |age|height| +------+---+------+ |robot0|0 |0.0 | |robot1|1 |0.5 | |robot2|2 |1.0 | |robot3|3 |1.5 | |robot4|4 |2.0 | +------+---+------+ +------+---+------+ |name |age|height| +------+---+------+ |robot0|0 |0.0 | |robot1|1 |0.5 | +------+---+------+ +-------------+ |sum(age) | +-------------+ |1515150959596| +-------------+ +--------+ |count(1)| +--------+ |5888000 | +--------+ +--------+ |count(1)| +--------+ |5888000 | +--------+ +----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+ |name |string |null | |age |int |null | |height |double |null | | | | | |# Detailed Table Information| | | |Database |default | | |Table |sdkoutputtable | | |Owner |root | | |Created |Mon Apr 23 01:43:47 PDT 2018 | | |Last Access |Wed Dec 31 16:00:00 PST 1969 | | |Type |EXTERNAL | | |Provider |carbonfile | | |Table Properties |[transient_lastDdlTime=1524473027] | | |Location |file:/huawei/xubo/git/carbondata1/integration/spark-common-test/src/test/resources/SparkCarbonFileFormat/WriterOutput/Fact/Part0/Segment_null| | |Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | |Storage Properties |[serialization.format=1] | | +----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+ +-------+---+------+ |name |age|height| +-------+---+------+ |robot0 |0 |0.0 | |robot1 |1 |0.5 | |robot2 |2 |1.0 | |robot3 |3 |1.5 | |robot4 |4 |2.0 | |robot5 |5 |2.5 | |robot6 |6 |3.0 | |robot7 |7 |3.5 | |robot8 |8 |4.0 | |robot9 |9 |4.5 | |robot10|10 |5.0 | |robot11|11 |5.5 | |robot12|12 |6.0 | |robot13|13 |6.5 | |robot14|14 |7.0 | |robot15|15 |7.5 | |robot16|16 |8.0 | |robot17|17 |8.5 | |robot18|18 |9.0 | |robot19|19 |9.5 | +-------+---+------+ only showing top 20 rows +------+---+------+ |name |age|height| +------+---+------+ |robot0|0 |0.0 | |robot1|1 |0.5 | |robot2|2 |1.0 | +------+---+------+ +-------+ |name | +-------+ |robot0 | |robot1 | |robot2 | |robot3 | |robot4 | |robot5 | |robot6 | |robot7 | |robot8 | |robot9 | |robot10| |robot11| |robot12| |robot13| |robot14| |robot15| |robot16| |robot17| |robot18| |robot19| +-------+ only showing top 20 rows +---+ |age| +---+ |0 | |1 | |2 | |3 | |4 | |5 | |6 | |7 | |8 | |9 | |10 | |11 | |12 | |13 | |14 | |15 | |16 | |17 | |18 | |19 | +---+ only showing top 20 rows +------+---+------+ |name |age|height| +------+---+------+ |robot3|3 |1.5 | |robot4|4 |2.0 | |robot5|5 |2.5 | |robot6|6 |3.0 | |robot7|7 |3.5 | +------+---+------+ +------+---+------+ |name |age|height| +------+---+------+ |robot3|3 |1.5 | +------+---+------+ +------+---+------+ |name |age|height| +------+---+------+ |robot0|0 |0.0 | |robot1|1 |0.5 | |robot2|2 |1.0 | |robot3|3 |1.5 | |robot4|4 |2.0 | +------+---+------+ +------+---+------+ |name |age|height| +------+---+------+ |robot0|0 |0.0 | |robot1|1 |0.5 | +------+---+------+ +-------------+ |sum(age) | +-------------+ |1515150959596| +-------------+ +--------+ |count(1)| +--------+ |5888000 | +--------+ +--------+ |count(1)| +--------+ |5888000 | +--------+ 18/04/23 01:43:56 ERROR Executor: Exception in task 0.0 in stage 32.0 (TID 38) org.apache.spark.SparkException: Index file not present to read the carbondata file at org.apache.spark.sql.SparkCarbonFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(SparkCarbonFileFormat.scala:231) at org.apache.spark.sql.SparkCarbonFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(SparkCarbonFileFormat.scala:188) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:174) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 18/04/23 01:43:56 ERROR TaskSetManager: Task 0 in stage 32.0 failed 1 times; aborting job Process finished with exit code 0 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) |
Free forum by Nabble | Edit this page |