Apache CarbonData Dev Mailing List archive

Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

Classic

List

Threaded

21 messages Options

yixu2001

Nov 16, 2017; 3:20am

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

dev
Will the patch be released?

yixu2001

From: sounak
Date: 2017-11-07 20:56
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
Hi,

This fix is still under development. Hopefully we will be able to give a
proper solution by this week end.

On Tue, Nov 7, 2017 at 1:24 PM, yixu2001 <[hidden email]> wrote:

> dev
> What time is expected to release a patch, we can test
>
>
> yixu2001
>
> From: yixu2001
> Date: 2017-11-07 15:47
> To: dev
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> dev
> hello there is no solution, the next version carbondata 1.2.1. can be
> released?
>
>
> yixu2001
>
> From: yixu2001
> Date: 2017-10-27 16:55
> To: dev
> CC: sounak; chenliang6136
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> dev
>
> thank you
> We hope it will be resolved in a new release version carbondata 1.2.1.
> In addition, the problem about maillist "Is query slowing down due to the
> fragmentations produced by the many update operations?" is also a very
> important problem in our selection process. We will be grateful if it can
> also be resolved in carbondata 1.2.1.
>
>
> yixu2001
>
> From: yixu2001
> Date: 2017-10-21 11:12
> To: dev
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> dev
>
> testdata
> 链接：http://pan.baidu.com/s/1boVAqeF 密码：qtdh
> CSV
>
>
> yixu2001
> From: Raghunandan S
> Date: 2017-10-20 16:56
> To: dev
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> Should be fine.do we need to create any login to access it?
> On Fri, 20 Oct 2017 at 1:31 PM, yixu2001 <[hidden email]> wrote:
> > dev
> > Baidu SkyDrive share it ?
> >
> >
> > yixu2001
> >
> > From: sounak
> > Date: 2017-10-20 16:14
> > To: dev
> > Subject: Re: Re: Update statement failed with "Multiple input rows
> matched
> > for same row" in version 1.2.0,
> > As the tupleIds are also unique, the only option left is to get hold of
> > your data and reproducing it in-house. Is it possible to generate your
> data
> > in-house or get it?
> >
> > On Fri, Oct 20, 2017 at 12:14 PM, yixu2001 <[hidden email]> wrote:
> >
> > > dev
> > >
> > > scala> cc.sql(" select c.tupleId,count(*) from (select a.remark,a.id
> ,
> > > getTupleId() as tupleId from c_indextest1 a,c_indextest2 b where a.id=
> > b.id)
> > > c group by c.tupleId having count(*)>1").show;
> > > +-------+--------+
> > > |tupleId|count(1)|
> > > +-------+--------+
> > > +-------+--------+
> > >
> > > no Multiple input rows
> > >
> > >
> > > yixu2001
> > >
> > > From: sounak
> > > Date: 2017-10-20 10:57
> > > To: dev
> > > Subject: Re: Re: Update statement failed with "Multiple input rows
> > matched
> > > for same row" in version 1.2.0,
> > > Slight rectification in typo of the query syntax
> > >
> > > " select a.remark,a.id, getTupleId() as *tupleId* from c_indextest1 a,
> > > c_indextest2
> > > b where a.id=b.id;"
> > >
> > > On Fri, Oct 20, 2017 at 7:54 AM, sounak <[hidden email]> wrote:
> > >
> > > > Internally, we call a UDF to generate the TupleId and based on that
> > > > tupleId we decide if the row is duplicated or not. You can run a
> > slightly
> > > > tweaked query
> > > >
> > > > " select a.remark,a.id, getTupleId() as TupleId from c_indextest1 a,
> > > c_indextest2
> > > > b where a.id=b.id; "
> > > >
> > > > In order to prompt Multiple Input Rows most probably the getTupleId()
> > > will
> > > > give duplicate entries.
> > > >
> > > > In between your given query runs fine with the data i have generated
> in
> > > > house. So this is data dependent. So it will be good if we get
> > > > *origin_data* or the script which has build the data for origin_data.
> > > >
> > > > On Fri, Oct 20, 2017 at 7:28 AM, yixu2001 <[hidden email]> wrote:
> > > >
> > > >> dev
> > > >> one row record
> > > >>
> > > >>
> > > >> yixu2001
> > > >>
> > > >> From: Liang Chen
> > > >> Date: 2017-10-19 22:26
> > > >> To: dev
> > > >> Subject: Re: Re: Update statement failed with "Multiple input rows
> > > >> matched for same row" in version 1.2.0,
> > > >> Hi
> > > >>
> > > >> Execute the below query, return one row record or multiple row
> > records ?
> > > >> ---------------------------------
> > > >> select a.remark from c_indextest1 a where a.id=b.id
> > > >>
> > > >> Regards
> > > >> Liang
> > > >>
> > > >>
> > > >> yixu2001 wrote
> > > >> > dev
> > > >> > You can follow the steps below to reproduce the problem.
> > > >> > tables c_indextest2 has 1700w records and table c_indextest1 has
> > about
> > > >> 30w
> > > >> > records.
> > > >> >
> > > >> > step 1:
> > > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark
> > > >> string)
> > > >> > STORED BY 'carbondata'").show;
> > > >> >
> > > >> > step 2: origin_data is a existing table with 1700w records, and
> > fields
> > > >> in
> > > >> > the table do not matter.
> > > >> > cc.sql("insert into c_indextest2 select row_number()
> over(partition
> > by
> > > >> > a.PKid order by a.pkid) id,a.remark from (SELECT '1' PKID,'dfsdd'
> > > >> remark
> > > >> > from origin_data limit 17000000) a").show;
> > > >> >
> > > >> > step 3:
> > > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark
> > > >> string)
> > > >> > STORED BY 'carbondata'").show;
> > > >> >
> > > >> > step 4:
> > > >> > cc.sql("insert into c_indextest1 select * from c_indextest2 where
> > > >> > pmod(cast(ID as int),50)=43").show;
> > > >> >
> > > >> > setp 5:
> > > >> > cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from
> > > >> > c_indextest1 a where a.id=b.id)").show;
> > > >> >
> > > >> >
> > > >> > yixu2001
> > > >> >
> > > >> > From: sounak
> > > >> > Date: 2017-10-19 18:26
> > > >> > To: dev
> > > >> > Subject: Re: Re: Update statement failed with "Multiple input rows
> > > >> matched
> > > >> > for same row" in version 1.2.0,
> > > >> > Hi Yixu,
> > > >> >
> > > >> > Thanks for the ddl. But i am particularly interested on the data
> in
> > > >> order
> > > >> > to reproduce the problem. Will it be feasible to share to data or
> if
> > > it
> > > >> is
> > > >> > huge then the scripts that generates it? I cannot access the data
> > > >> location
> > > >> > from public network.
> > > >> >
> > > >> >
> > > >> > On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 <
> > > >>
> > > >> > yixu2001@
> > > >>
> > > >> > > wrote:
> > > >> >
> > > >> >> dev
> > > >> >>
> > > >> >> Step 1:I make a hive table qqdata2.h_indextest1:
> > > >> >> CREATE EXTERNAL TABLE `qqdata2.h_indextest1`(
> > > >> >> `id` INT,
> > > >> >> `CUST_ORDER_ID` STRING,
> > > >> >> `ORDER_ITEM_IDATTR_ID` STRING,
> > > >> >> `ATTR_VALUE_IDATTR_VALUE` STRING,
> > > >> >> `CREATE_DATE` STRING,
> > > >> >> `UPDATE_DATE` STRING,
> > > >> >> `STATUS_CD` STRING,
> > > >> >> `STATUS_DATE` STRING,
> > > >> >> `AREA_ID` STRING,
> > > >> >> `REGION_CD` STRING,
> > > >> >> `UPDATE_STAFF` STRING,
> > > >> >> `CREATE_STAFF` STRING,
> > > >> >> `SHARDING_ID` STRING,
> > > >> >> `ORDER_ATTR_ID` STRING )
> > > >> >> ROW FORMAT DELIMITED
> > > >> >> FIELDS TERMINATED BY '\u0001'
> > > >> >> LINES TERMINATED BY '\n'
> > > >> >> NULL DEFINED AS ''
> > > >> >> STORED AS INPUTFORMAT
> > > >> >> 'org.apache.hadoop.mapred.TextInputFormat'
> > > >> >> OUTPUTFORMAT
> > > >> >> 'org.apache.hadoop.hive.ql.io
> > > >> >>
> > > >> >> .HiveIgnoreKeyTextOutputFormat'
> > > >> >> LOCATION
> > > >> >> 'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1
> > > >> >>
> > > >> >> ';
> > > >> >>
> > > >> >> there are csv files in hdfs dir user/bigdata/streamcql/dist1, the
> > > data
> > > >> >> format in the csv file just as following
> > > >> >>
> > > >> >> 1939505130,171483932,287305502,813463930,20160709134396669,
> > > >> >> 201607101469099594,1299,20160711996575390,10,73,302063,
> > > >> 302064,127859875,
> > > >> >> 9999999
> > > >> >>
> > > >> >> Step 2:I make a carbon table qqdata2.c_indextest1:
> > > >> >> cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id
> STRING,
> > > >> >> CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID
> > > >> STRING,ATTR_VALUE_IDATTR_VALUE
> > > >> >> STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD
> > > >> STRING,STATUS_DATE
> > > >> >> STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF
> > > STRING,CREATE_STAFF
> > > >> >> STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY
> > > 'carbondata'
> > > >> ")
> > > >> >>
> > > >> >> Step 3:Insert data:
> > > >> >> cc.sql("insert into qqdata2.c_indextest1 select * from
> > > >> >> qqdata2.h_indextest1").show(100,false);
> > > >> >>
> > > >> >> Step 4: Repeat from step1 to step 3, I make another carbon table
> > > >> >> qqdata2.c_indextest2
> > > >> >> The record number of qqdata2.c_indextest1 is 30w, the record
> number
> > > of
> > > >> >> qqdata2.c_indextest2 is 1700w.
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> yixu2001
> > > >> >>
> > > >> >> From: sounak
> > > >> >> Date: 2017-10-19 14:13
> > > >> >> To: dev
> > > >> >> Subject: Re: Update statement failed with "Multiple input rows
> > > matched
> > > >> >> for
> > > >> >> same row" in version 1.2.0,
> > > >> >> Hi Yixu,
> > > >> >>
> > > >> >> Can you please share the DDLs and the Data for the above problem
> > with
> > > >> us?
> > > >> >>
> > > >> >> Thanks
> > > >> >> Sounak
> > > >> >>
> > > >> >> On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <
> > > >>
> > > >> > yixu2001@
> > > >>
> > > >> > > wrote:
> > > >> >>
> > > >> >> > dev
> > > >> >> >
> > > >> >> >
> > > >> >> > In carbondata version 1.2.0, I execute "update" statement with
> > > >> >> sub-query,
> > > >> >> > it failed.
> > > >> >> > All the rows in the 2 tables are not duplicated, and the same
> > > >> statement
> > > >> >> > will succeed in carbondata version 1.1.1.
> > > >> >> >
> > > >> >> > The test log as following:
> > > >> >> > scala> cc.sql("select count(*), count(distinct id) from
> > > >> >> > qqdata2.c_indextest1").show(100,false);
> > > >> >> > +--------+------------------+
> > > >> >> > |count(1)|count(DISTINCT id)|
> > > >> >> > +--------+------------------+
> > > >> >> > |300000 |300000 |
> > > >> >> > +--------+------------------+
> > > >> >> >
> > > >> >> > scala> cc.sql("select count(*), count(distinct id) from
> > > >> >> > qqdata2.c_indextest2").show(100,false);
> > > >> >> > +--------+------------------+
> > > >> >> > |count(1)|count(DISTINCT id)|
> > > >> >> > +--------+------------------+
> > > >> >> > |71223220|71223220 |
> > > >> >> > +--------+------------------+
> > > >> >> >
> > > >> >> > scala> cc.sql("update qqdata2.c_indextest2 a
> > > >> >> set(a.CUST_ORDER_ID,a.ORDER_
> > > >> >> > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a.
> > > >> >> > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a.
> > > >> >> > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDE
> > > >> R_ATTR_ID)
> > > >> >> =
> > > >> >> > (select b.CUST_ORDER_ID,b.ORDER_ITEM_
> > > IDATTR_ID,b.ATTR_VALUE_IDATTR_
> > > >> >> > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_
> > > >> >> > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.
> > > >> >> SHARDING_ID,b.ORDER_ATTR_ID
> > > >> >> > from qqdata2.c_indextest1 b where a.id = b.id
> > )").show(100,false);
> > > >> >> > 17/10/18 11:32:46 WARN Utils: Truncated the string
> representation
> > > of
> > > >> a
> > > >> >> > plan since it was too large. This behavior can be adjusted by
> > > setting
> > > >> >> > 'spark.debug.maxToStringFields' in SparkEnv.conf.
> > > >> >> > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn
> > > >> >> >
> > > >> >> > ][bigdata][Thread-1]Delete data operation is failed for
> > > >> >> > qqdata2.c_indextest2
> > > >> >> > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data
> > > operation
> > > >> is
> > > >> >> > failed due to failure in creating delete delta file for
> segment :
> > > >> null
> > > >> >> > block : null
> > > >> >> > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main
> Exception
> > in
> > > >> >> update
> > > >> >> > operationjava.lang.Exception: Multiple input rows matched for
> > same
> > > >> row.
> > > >> >> > java.lang.RuntimeException: Update operation failed. Multiple
> > input
> > > >> >> rows
> > > >> >> > matched for same row.
> > > >> >> > at scala.sys.package$.error(package.scala:27)
> > > >> >> > at org.apache.spark.sql.execution.command.
> > > ProjectForUpdateCommand.
> > > >> >> > processData(IUDCommands.scala:239)
> > > >> >> > at
> > > >> >> org.apache.spark.sql.execution.command.
> ProjectForUpdateCommand.run(
> > > >> >> > IUDCommands.scala:141)
> > > >> >> > at org.apache.spark.sql.execution.command.
> ExecutedCommandExec.
> > > >> >> > sideEffectResult$lzycompute(commands.scala:58)
> > > >> >> > at org.apache.spark.sql.execution.command.
> ExecutedCommandExec.
> > > >> >> > sideEffectResult(commands.scala:56)
> > > >> >> > at org.apache.spark.sql.execution.command.
> ExecutedCommandExec.
> > > >> >> > executeTake(commands.scala:71)
> > > >> >> > at org.apache.spark.sql.execution.CollectLimitExec.
> > > >> >> > executeCollect(limit.scala:38)
> > > >> >> > at org.apache.spark.sql.Dataset$$
> anonfun$org$apache$spark$sql$
> > > >> >> > Dataset$$execute$1$1.apply(Dataset.scala:2378)
> > > >> >> > at org.apache.spark.sql.execution.SQLExecution$.
> withNewExecutio
> > > >> nId(
> > > >> >> > SQLExecution.scala:57)
> > > >> >> > at
> > > >> >> org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.
> scala:2780)
> > > >> >> > at org.apache.spark.sql.Dataset.org
> > > >> >> >
> > > >> >> > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
> > > >> >> > at org.apache.spark.sql.Dataset.org
> > > >> >> >
> > > >> >> > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384)
> > > >> >> > at org.apache.spark.sql.Dataset$$
> anonfun$head$1.apply(Dataset.
> > > >> >> > scala:2120)
> > > >> >> > at org.apache.spark.sql.Dataset$$
> anonfun$head$1.apply(Dataset.
> > > >> >> > scala:2119)
> > > >> >> > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.
> scala
> > > >> :2810)
> > > >> >> > at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
> > > >> >> > at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
> > > >> >> > at org.apache.spark.sql.Dataset.
> showString(Dataset.scala:248)
> > > >> >> > at org.apache.spark.sql.Dataset.show(Dataset.scala:640)
> > > >> >> > ... 50 elided
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > yixu2001
> > > >> >> >
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >> Thanks
> > > >> >> Sounak
> > > >> >>
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Thanks
> > > >> > Sounak
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5
> > > >> .nabble.com/
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks
> > > > Sounak
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks
> > > Sounak
> > >
> >
> >
> >
> > --
> > Thanks
> > Sounak
> >
>

... [show rest of quote]

--
Thanks
Sounak