Apache CarbonData Dev Mailing List archive - Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

Apache CarbonData Dev Mailing List archive

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

Posted by sounak on Oct 20, 2017; 2:24am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Update-statement-failed-with-Multiple-input-rows-matched-for-same-row-in-version-1-2-0-tp24222p24388.html

Internally, we call a UDF to generate the TupleId and based on that tupleId
we decide if the row is duplicated or not. You can run a slightly tweaked
query

" select a.remark,a.id, getTupleId() as TupleId from c_indextest1 a,
c_indextest2
b where a.id=b.id; "

In order to prompt Multiple Input Rows most probably the getTupleId() will
give duplicate entries.

In between your given query runs fine with the data i have generated in
house. So this is data dependent. So it will be good if we get *origin_data*
or the script which has build the data for origin_data.

On Fri, Oct 20, 2017 at 7:28 AM, yixu2001 <[hidden email]> wrote:

> dev
> one row record
>
>
> yixu2001
>
> From: Liang Chen
> Date: 2017-10-19 22:26
> To: dev
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> Hi
>
> Execute the below query, return one row record or multiple row records ?
> ---------------------------------
> select a.remark from c_indextest1 a where a.id=b.id
>
> Regards
> Liang
>
>
> yixu2001 wrote
> > dev
> > You can follow the steps below to reproduce the problem.
> > tables c_indextest2 has 1700w records and table c_indextest1 has about
> 30w
> > records.
> >
> > step 1:
> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark
> string)
> > STORED BY 'carbondata'").show;
> >
> > step 2: origin_data is a existing table with 1700w records, and fields in
> > the table do not matter.
> > cc.sql("insert into c_indextest2 select row_number() over(partition by
> > a.PKid order by a.pkid) id,a.remark from (SELECT '1' PKID,'dfsdd' remark
> > from origin_data limit 17000000) a").show;
> >
> > step 3:
> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark
> string)
> > STORED BY 'carbondata'").show;
> >
> > step 4:
> > cc.sql("insert into c_indextest1 select * from c_indextest2 where
> > pmod(cast(ID as int),50)=43").show;
> >
> > setp 5:
> > cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from
> > c_indextest1 a where a.id=b.id)").show;
> >
> >
> > yixu2001
> >
> > From: sounak
> > Date: 2017-10-19 18:26
> > To: dev
> > Subject: Re: Re: Update statement failed with "Multiple input rows
> matched
> > for same row" in version 1.2.0,
> > Hi Yixu,
> >
> > Thanks for the ddl. But i am particularly interested on the data in order
> > to reproduce the problem. Will it be feasible to share to data or if it
> is
> > huge then the scripts that generates it? I cannot access the data
> location
> > from public network.
> >
> >
> > On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 <
>
> > yixu2001@
>
> > > wrote:
> >
> >> dev
> >>
> >> Step 1:I make a hive table qqdata2.h_indextest1:
> >> CREATE EXTERNAL TABLE `qqdata2.h_indextest1`(
> >> `id` INT,
> >> `CUST_ORDER_ID` STRING,
> >> `ORDER_ITEM_IDATTR_ID` STRING,
> >> `ATTR_VALUE_IDATTR_VALUE` STRING,
> >> `CREATE_DATE` STRING,
> >> `UPDATE_DATE` STRING,
> >> `STATUS_CD` STRING,
> >> `STATUS_DATE` STRING,
> >> `AREA_ID` STRING,
> >> `REGION_CD` STRING,
> >> `UPDATE_STAFF` STRING,
> >> `CREATE_STAFF` STRING,
> >> `SHARDING_ID` STRING,
> >> `ORDER_ATTR_ID` STRING )
> >> ROW FORMAT DELIMITED
> >> FIELDS TERMINATED BY '\u0001'
> >> LINES TERMINATED BY '\n'
> >> NULL DEFINED AS ''
> >> STORED AS INPUTFORMAT
> >> 'org.apache.hadoop.mapred.TextInputFormat'
> >> OUTPUTFORMAT
> >> 'org.apache.hadoop.hive.ql.io
> >>
> >> .HiveIgnoreKeyTextOutputFormat'
> >> LOCATION
> >> 'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1
> >>
> >> ';
> >>
> >> there are csv files in hdfs dir user/bigdata/streamcql/dist1, the data
> >> format in the csv file just as following
> >>
> >> 1939505130,171483932,287305502,813463930,20160709134396669,
> >> 201607101469099594,1299,20160711996575390,10,73,
> 302063,302064,127859875,
> >> 9999999
> >>
> >> Step 2:I make a carbon table qqdata2.c_indextest1:
> >> cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id STRING,
> >> CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID STRING,ATTR_VALUE_IDATTR_VALUE
> >> STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD
> STRING,STATUS_DATE
> >> STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF STRING,CREATE_STAFF
> >> STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY 'carbondata'
> ")
> >>
> >> Step 3:Insert data:
> >> cc.sql("insert into qqdata2.c_indextest1 select * from
> >> qqdata2.h_indextest1").show(100,false);
> >>
> >> Step 4: Repeat from step1 to step 3, I make another carbon table
> >> qqdata2.c_indextest2
> >> The record number of qqdata2.c_indextest1 is 30w, the record number of
> >> qqdata2.c_indextest2 is 1700w.
> >>
> >>
> >>
> >> yixu2001
> >>
> >> From: sounak
> >> Date: 2017-10-19 14:13
> >> To: dev
> >> Subject: Re: Update statement failed with "Multiple input rows matched
> >> for
> >> same row" in version 1.2.0,
> >> Hi Yixu,
> >>
> >> Can you please share the DDLs and the Data for the above problem with
> us?
> >>
> >> Thanks
> >> Sounak
> >>
> >> On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <
>
> > yixu2001@
>
> > > wrote:
> >>
> >> > dev
> >> >
> >> >
> >> > In carbondata version 1.2.0, I execute "update" statement with
> >> sub-query,
> >> > it failed.
> >> > All the rows in the 2 tables are not duplicated, and the same
> statement
> >> > will succeed in carbondata version 1.1.1.
> >> >
> >> > The test log as following:
> >> > scala> cc.sql("select count(*), count(distinct id) from
> >> > qqdata2.c_indextest1").show(100,false);
> >> > +--------+------------------+
> >> > |count(1)|count(DISTINCT id)|
> >> > +--------+------------------+
> >> > |300000 |300000 |
> >> > +--------+------------------+
> >> >
> >> > scala> cc.sql("select count(*), count(distinct id) from
> >> > qqdata2.c_indextest2").show(100,false);
> >> > +--------+------------------+
> >> > |count(1)|count(DISTINCT id)|
> >> > +--------+------------------+
> >> > |71223220|71223220 |
> >> > +--------+------------------+
> >> >
> >> > scala> cc.sql("update qqdata2.c_indextest2 a
> >> set(a.CUST_ORDER_ID,a.ORDER_
> >> > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a.
> >> > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a.
> >> > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.
> ORDER_ATTR_ID)
> >> =
> >> > (select b.CUST_ORDER_ID,b.ORDER_ITEM_IDATTR_ID,b.ATTR_VALUE_IDATTR_
> >> > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_
> >> > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.
> >> SHARDING_ID,b.ORDER_ATTR_ID
> >> > from qqdata2.c_indextest1 b where a.id = b.id)").show(100,false);
> >> > 17/10/18 11:32:46 WARN Utils: Truncated the string representation of a
> >> > plan since it was too large. This behavior can be adjusted by setting
> >> > 'spark.debug.maxToStringFields' in SparkEnv.conf.
> >> > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn
> >> >
> >> > ][bigdata][Thread-1]Delete data operation is failed for
> >> > qqdata2.c_indextest2
> >> > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data operation
> is
> >> > failed due to failure in creating delete delta file for segment : null
> >> > block : null
> >> > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main Exception in
> >> update
> >> > operationjava.lang.Exception: Multiple input rows matched for same
> row.
> >> > java.lang.RuntimeException: Update operation failed. Multiple input
> >> rows
> >> > matched for same row.
> >> > at scala.sys.package$.error(package.scala:27)
> >> > at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.
> >> > processData(IUDCommands.scala:239)
> >> > at
> >> org.apache.spark.sql.execution.command.ProjectForUpdateCommand.run(
> >> > IUDCommands.scala:141)
> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> >> > sideEffectResult$lzycompute(commands.scala:58)
> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> >> > sideEffectResult(commands.scala:56)
> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> >> > executeTake(commands.scala:71)
> >> > at org.apache.spark.sql.execution.CollectLimitExec.
> >> > executeCollect(limit.scala:38)
> >> > at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$
> >> > Dataset$$execute$1$1.apply(Dataset.scala:2378)
> >> > at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(
> >> > SQLExecution.scala:57)
> >> > at
> >> org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780)
> >> > at org.apache.spark.sql.Dataset.org
> >> >
> >> > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
> >> > at org.apache.spark.sql.Dataset.org
> >> >
> >> > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384)
> >> > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
> >> > scala:2120)
> >> > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
> >> > scala:2119)
> >> > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.
> scala:2810)
> >> > at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
> >> > at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
> >> > at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
> >> > at org.apache.spark.sql.Dataset.show(Dataset.scala:640)
> >> > ... 50 elided
> >> >
> >> >
> >> >
> >> > yixu2001
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks
> >> Sounak
> >>
> >
> >
> >
> > --
> > Thanks
> > Sounak
>
>
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>

--
Thanks
Sounak