Login  Register

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

Posted by yixu2001 on Oct 20, 2017; 3:43am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Update-statement-failed-with-Multiple-input-rows-matched-for-same-row-in-version-1-2-0-tp24222p24397.html

dev
 Sorry, I have made a mistake.

Execute the below query, it return multiple row records.

select a.remark from  c_indextest1 a,c_indextest2 b where a.id=b.id


yixu2001
 
From: Liang Chen
Date: 2017-10-19 22:26
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
Hi
 
Execute the below query, return one row record or multiple row records ?
---------------------------------
select a.remark from  c_indextest1 a where a.id=b.id
 
Regards
Liang
 
 
yixu2001 wrote

> dev
>  You can follow the steps below to reproduce the problem.
> tables c_indextest2 has 1700w records and table c_indextest1 has about 30w
> records.
>
> step 1:
> cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark string)
> STORED BY 'carbondata'").show;
>
> step 2: origin_data is a existing table with 1700w records, and fields in
> the table do not matter.
> cc.sql("insert into c_indextest2 select row_number() over(partition by
> a.PKid order by a.pkid) id,a.remark from  (SELECT '1' PKID,'dfsdd' remark
> from origin_data limit 17000000) a").show;
>
> step 3:
> cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark string)
> STORED BY 'carbondata'").show;
>
> step 4:
> cc.sql("insert into c_indextest1 select * from  c_indextest2 where
> pmod(cast(ID as int),50)=43").show;
>
> setp 5:
> cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from
> c_indextest1 a where a.id=b.id)").show;
>
>
> yixu2001
>  
> From: sounak
> Date: 2017-10-19 18:26
> To: dev
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> Hi Yixu,
>  
> Thanks for the ddl. But i am particularly interested on the data in order
> to reproduce the problem. Will it be feasible to share to data or if it is
> huge then the scripts that generates it? I cannot access the data location
> from public network.
>  
>  
> On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 <
 
> yixu2001@
 

> > wrote:
>  
>> dev
>>
>> Step 1:I make a hive table qqdata2.h_indextest1:
>> CREATE EXTERNAL TABLE `qqdata2.h_indextest1`(
>> `id` INT,
>> `CUST_ORDER_ID` STRING,
>> `ORDER_ITEM_IDATTR_ID` STRING,
>> `ATTR_VALUE_IDATTR_VALUE` STRING,
>> `CREATE_DATE` STRING,
>> `UPDATE_DATE` STRING,
>> `STATUS_CD` STRING,
>> `STATUS_DATE` STRING,
>> `AREA_ID` STRING,
>> `REGION_CD` STRING,
>> `UPDATE_STAFF` STRING,
>> `CREATE_STAFF` STRING,
>> `SHARDING_ID` STRING,
>> `ORDER_ATTR_ID` STRING )
>> ROW FORMAT DELIMITED
>>   FIELDS TERMINATED BY '\u0001'
>>   LINES TERMINATED BY '\n'
>>   NULL DEFINED AS ''
>> STORED AS INPUTFORMAT
>>   'org.apache.hadoop.mapred.TextInputFormat'
>> OUTPUTFORMAT
>>   'org.apache.hadoop.hive.ql.io
>>
>> .HiveIgnoreKeyTextOutputFormat'
>> LOCATION
>>   'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1
>>
>> ';
>>
>> there are csv files in hdfs dir user/bigdata/streamcql/dist1, the data
>> format in the csv file just as following
>>
>> 1939505130,171483932,287305502,813463930,20160709134396669,
>> 201607101469099594,1299,20160711996575390,10,73,302063,302064,127859875,
>> 9999999
>>
>> Step 2:I make a carbon table qqdata2.c_indextest1:
>> cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id STRING,
>> CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID STRING,ATTR_VALUE_IDATTR_VALUE
>> STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD STRING,STATUS_DATE
>> STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF STRING,CREATE_STAFF
>> STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY 'carbondata' ")
>>
>> Step 3:Insert data:
>> cc.sql("insert into qqdata2.c_indextest1 select * from
>> qqdata2.h_indextest1").show(100,false);
>>
>> Step 4: Repeat from step1 to step 3, I make another carbon table
>> qqdata2.c_indextest2
>> The record number of qqdata2.c_indextest1 is 30w, the record number of
>> qqdata2.c_indextest2 is 1700w.
>>
>>
>>
>> yixu2001
>>
>> From: sounak
>> Date: 2017-10-19 14:13
>> To: dev
>> Subject: Re: Update statement failed with "Multiple input rows matched
>> for
>> same row" in version 1.2.0,
>> Hi Yixu,
>>
>> Can you please share the DDLs and the Data for the above problem with us?
>>
>> Thanks
>> Sounak
>>
>> On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <
 
> yixu2001@
 

> > wrote:
>>
>> > dev
>> >
>> >
>> > In carbondata version 1.2.0, I execute "update" statement with
>> sub-query,
>> > it failed.
>> > All the rows in the 2 tables are not duplicated, and the same statement
>> > will succeed in carbondata version 1.1.1.
>> >
>> > The test log as following:
>> > scala> cc.sql("select count(*), count(distinct id) from
>> > qqdata2.c_indextest1").show(100,false);
>> > +--------+------------------+
>> > |count(1)|count(DISTINCT id)|
>> > +--------+------------------+
>> > |300000  |300000            |
>> > +--------+------------------+
>> >
>> > scala> cc.sql("select count(*), count(distinct id) from
>> > qqdata2.c_indextest2").show(100,false);
>> > +--------+------------------+
>> > |count(1)|count(DISTINCT id)|
>> > +--------+------------------+
>> > |71223220|71223220          |
>> > +--------+------------------+
>> >
>> > scala> cc.sql("update qqdata2.c_indextest2 a
>> set(a.CUST_ORDER_ID,a.ORDER_
>> > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a.
>> > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a.
>> > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDER_ATTR_ID)
>> =
>> > (select b.CUST_ORDER_ID,b.ORDER_ITEM_IDATTR_ID,b.ATTR_VALUE_IDATTR_
>> > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_
>> > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.
>> SHARDING_ID,b.ORDER_ATTR_ID
>> > from qqdata2.c_indextest1 b where a.id = b.id)").show(100,false);
>> > 17/10/18 11:32:46 WARN Utils: Truncated the string representation of a
>> > plan since it was too large. This behavior can be adjusted by setting
>> > 'spark.debug.maxToStringFields' in SparkEnv.conf.
>> > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn
>> >
>> > ][bigdata][Thread-1]Delete data operation is failed for
>> > qqdata2.c_indextest2
>> > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data operation is
>> > failed due to failure in creating delete delta file for segment : null
>> > block : null
>> > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main Exception in
>> update
>> > operationjava.lang.Exception: Multiple input rows matched for same row.
>> > java.lang.RuntimeException: Update operation failed. Multiple input
>> rows
>> > matched for same row.
>> >   at scala.sys.package$.error(package.scala:27)
>> >   at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.
>> > processData(IUDCommands.scala:239)
>> >   at
>> org.apache.spark.sql.execution.command.ProjectForUpdateCommand.run(
>> > IUDCommands.scala:141)
>> >   at org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> > sideEffectResult$lzycompute(commands.scala:58)
>> >   at org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> > sideEffectResult(commands.scala:56)
>> >   at org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> > executeTake(commands.scala:71)
>> >   at org.apache.spark.sql.execution.CollectLimitExec.
>> > executeCollect(limit.scala:38)
>> >   at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$
>> > Dataset$$execute$1$1.apply(Dataset.scala:2378)
>> >   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(
>> > SQLExecution.scala:57)
>> >   at
>> org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780)
>> >   at org.apache.spark.sql.Dataset.org
>> >
>> > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
>> >   at org.apache.spark.sql.Dataset.org
>> >
>> > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384)
>> >   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
>> > scala:2120)
>> >   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
>> > scala:2119)
>> >   at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2810)
>> >   at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
>> >   at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
>> >   at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
>> >   at org.apache.spark.sql.Dataset.show(Dataset.scala:640)
>> >   ... 50 elided
>> >
>> >
>> >
>> > yixu2001
>> >
>>
>>
>>
>> --
>> Thanks
>> Sounak
>>
>  
>  
>  
> --
> Thanks
> Sounak
 
 
 
 
 
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/