dev
Will the patch be released? yixu2001 From: sounak Date: 2017-11-07 20:56 To: dev Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0, Hi, This fix is still under development. Hopefully we will be able to give a proper solution by this week end. On Tue, Nov 7, 2017 at 1:24 PM, yixu2001 <[hidden email]> wrote: > dev > What time is expected to release a patch, we can test > > > yixu2001 > > From: yixu2001 > Date: 2017-11-07 15:47 > To: dev > Subject: Re: Re: Update statement failed with "Multiple input rows matched > for same row" in version 1.2.0, > dev > hello there is no solution, the next version carbondata 1.2.1. can be > released? > > > yixu2001 > > From: yixu2001 > Date: 2017-10-27 16:55 > To: dev > CC: sounak; chenliang6136 > Subject: Re: Re: Update statement failed with "Multiple input rows matched > for same row" in version 1.2.0, > dev > > thank you > We hope it will be resolved in a new release version carbondata 1.2.1. > In addition, the problem about maillist "Is query slowing down due to the > fragmentations produced by the many update operations?" is also a very > important problem in our selection process. We will be grateful if it can > also be resolved in carbondata 1.2.1. > > > yixu2001 > > From: yixu2001 > Date: 2017-10-21 11:12 > To: dev > Subject: Re: Re: Update statement failed with "Multiple input rows matched > for same row" in version 1.2.0, > dev > > testdata > 链接:http://pan.baidu.com/s/1boVAqeF 密码:qtdh > CSV > > > yixu2001 > From: Raghunandan S > Date: 2017-10-20 16:56 > To: dev > Subject: Re: Re: Update statement failed with "Multiple input rows matched > for same row" in version 1.2.0, > Should be fine.do we need to create any login to access it? > On Fri, 20 Oct 2017 at 1:31 PM, yixu2001 <[hidden email]> wrote: > > dev > > Baidu SkyDrive share it ? > > > > > > yixu2001 > > > > From: sounak > > Date: 2017-10-20 16:14 > > To: dev > > Subject: Re: Re: Update statement failed with "Multiple input rows > matched > > for same row" in version 1.2.0, > > As the tupleIds are also unique, the only option left is to get hold of > > your data and reproducing it in-house. Is it possible to generate your > data > > in-house or get it? > > > > On Fri, Oct 20, 2017 at 12:14 PM, yixu2001 <[hidden email]> wrote: > > > > > dev > > > > > > scala> cc.sql(" select c.tupleId,count(*) from (select a.remark,a.id > , > > > getTupleId() as tupleId from c_indextest1 a,c_indextest2 b where a.id= > > b.id) > > > c group by c.tupleId having count(*)>1").show; > > > +-------+--------+ > > > |tupleId|count(1)| > > > +-------+--------+ > > > +-------+--------+ > > > > > > no Multiple input rows > > > > > > > > > yixu2001 > > > > > > From: sounak > > > Date: 2017-10-20 10:57 > > > To: dev > > > Subject: Re: Re: Update statement failed with "Multiple input rows > > matched > > > for same row" in version 1.2.0, > > > Slight rectification in typo of the query syntax > > > > > > " select a.remark,a.id, getTupleId() as *tupleId* from c_indextest1 a, > > > c_indextest2 > > > b where a.id=b.id;" > > > > > > On Fri, Oct 20, 2017 at 7:54 AM, sounak <[hidden email]> wrote: > > > > > > > Internally, we call a UDF to generate the TupleId and based on that > > > > tupleId we decide if the row is duplicated or not. You can run a > > slightly > > > > tweaked query > > > > > > > > " select a.remark,a.id, getTupleId() as TupleId from c_indextest1 a, > > > c_indextest2 > > > > b where a.id=b.id; " > > > > > > > > In order to prompt Multiple Input Rows most probably the getTupleId() > > > will > > > > give duplicate entries. > > > > > > > > In between your given query runs fine with the data i have generated > in > > > > house. So this is data dependent. So it will be good if we get > > > > *origin_data* or the script which has build the data for origin_data. > > > > > > > > On Fri, Oct 20, 2017 at 7:28 AM, yixu2001 <[hidden email]> wrote: > > > > > > > >> dev > > > >> one row record > > > >> > > > >> > > > >> yixu2001 > > > >> > > > >> From: Liang Chen > > > >> Date: 2017-10-19 22:26 > > > >> To: dev > > > >> Subject: Re: Re: Update statement failed with "Multiple input rows > > > >> matched for same row" in version 1.2.0, > > > >> Hi > > > >> > > > >> Execute the below query, return one row record or multiple row > > records ? > > > >> --------------------------------- > > > >> select a.remark from c_indextest1 a where a.id=b.id > > > >> > > > >> Regards > > > >> Liang > > > >> > > > >> > > > >> yixu2001 wrote > > > >> > dev > > > >> > You can follow the steps below to reproduce the problem. > > > >> > tables c_indextest2 has 1700w records and table c_indextest1 has > > about > > > >> 30w > > > >> > records. > > > >> > > > > >> > step 1: > > > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark > > > >> string) > > > >> > STORED BY 'carbondata'").show; > > > >> > > > > >> > step 2: origin_data is a existing table with 1700w records, and > > fields > > > >> in > > > >> > the table do not matter. > > > >> > cc.sql("insert into c_indextest2 select row_number() > over(partition > > by > > > >> > a.PKid order by a.pkid) id,a.remark from (SELECT '1' PKID,'dfsdd' > > > >> remark > > > >> > from origin_data limit 17000000) a").show; > > > >> > > > > >> > step 3: > > > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark > > > >> string) > > > >> > STORED BY 'carbondata'").show; > > > >> > > > > >> > step 4: > > > >> > cc.sql("insert into c_indextest1 select * from c_indextest2 where > > > >> > pmod(cast(ID as int),50)=43").show; > > > >> > > > > >> > setp 5: > > > >> > cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from > > > >> > c_indextest1 a where a.id=b.id)").show; > > > >> > > > > >> > > > > >> > yixu2001 > > > >> > > > > >> > From: sounak > > > >> > Date: 2017-10-19 18:26 > > > >> > To: dev > > > >> > Subject: Re: Re: Update statement failed with "Multiple input rows > > > >> matched > > > >> > for same row" in version 1.2.0, > > > >> > Hi Yixu, > > > >> > > > > >> > Thanks for the ddl. But i am particularly interested on the data > in > > > >> order > > > >> > to reproduce the problem. Will it be feasible to share to data or > if > > > it > > > >> is > > > >> > huge then the scripts that generates it? I cannot access the data > > > >> location > > > >> > from public network. > > > >> > > > > >> > > > > >> > On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 < > > > >> > > > >> > yixu2001@ > > > >> > > > >> > > wrote: > > > >> > > > > >> >> dev > > > >> >> > > > >> >> Step 1:I make a hive table qqdata2.h_indextest1: > > > >> >> CREATE EXTERNAL TABLE `qqdata2.h_indextest1`( > > > >> >> `id` INT, > > > >> >> `CUST_ORDER_ID` STRING, > > > >> >> `ORDER_ITEM_IDATTR_ID` STRING, > > > >> >> `ATTR_VALUE_IDATTR_VALUE` STRING, > > > >> >> `CREATE_DATE` STRING, > > > >> >> `UPDATE_DATE` STRING, > > > >> >> `STATUS_CD` STRING, > > > >> >> `STATUS_DATE` STRING, > > > >> >> `AREA_ID` STRING, > > > >> >> `REGION_CD` STRING, > > > >> >> `UPDATE_STAFF` STRING, > > > >> >> `CREATE_STAFF` STRING, > > > >> >> `SHARDING_ID` STRING, > > > >> >> `ORDER_ATTR_ID` STRING ) > > > >> >> ROW FORMAT DELIMITED > > > >> >> FIELDS TERMINATED BY '\u0001' > > > >> >> LINES TERMINATED BY '\n' > > > >> >> NULL DEFINED AS '' > > > >> >> STORED AS INPUTFORMAT > > > >> >> 'org.apache.hadoop.mapred.TextInputFormat' > > > >> >> OUTPUTFORMAT > > > >> >> 'org.apache.hadoop.hive.ql.io > > > >> >> > > > >> >> .HiveIgnoreKeyTextOutputFormat' > > > >> >> LOCATION > > > >> >> 'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1 > > > >> >> > > > >> >> '; > > > >> >> > > > >> >> there are csv files in hdfs dir user/bigdata/streamcql/dist1, the > > > data > > > >> >> format in the csv file just as following > > > >> >> > > > >> >> 1939505130,171483932,287305502,813463930,20160709134396669, > > > >> >> 201607101469099594,1299,20160711996575390,10,73,302063, > > > >> 302064,127859875, > > > >> >> 9999999 > > > >> >> > > > >> >> Step 2:I make a carbon table qqdata2.c_indextest1: > > > >> >> cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id > STRING, > > > >> >> CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID > > > >> STRING,ATTR_VALUE_IDATTR_VALUE > > > >> >> STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD > > > >> STRING,STATUS_DATE > > > >> >> STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF > > > STRING,CREATE_STAFF > > > >> >> STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY > > > 'carbondata' > > > >> ") > > > >> >> > > > >> >> Step 3:Insert data: > > > >> >> cc.sql("insert into qqdata2.c_indextest1 select * from > > > >> >> qqdata2.h_indextest1").show(100,false); > > > >> >> > > > >> >> Step 4: Repeat from step1 to step 3, I make another carbon table > > > >> >> qqdata2.c_indextest2 > > > >> >> The record number of qqdata2.c_indextest1 is 30w, the record > number > > > of > > > >> >> qqdata2.c_indextest2 is 1700w. > > > >> >> > > > >> >> > > > >> >> > > > >> >> yixu2001 > > > >> >> > > > >> >> From: sounak > > > >> >> Date: 2017-10-19 14:13 > > > >> >> To: dev > > > >> >> Subject: Re: Update statement failed with "Multiple input rows > > > matched > > > >> >> for > > > >> >> same row" in version 1.2.0, > > > >> >> Hi Yixu, > > > >> >> > > > >> >> Can you please share the DDLs and the Data for the above problem > > with > > > >> us? > > > >> >> > > > >> >> Thanks > > > >> >> Sounak > > > >> >> > > > >> >> On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 < > > > >> > > > >> > yixu2001@ > > > >> > > > >> > > wrote: > > > >> >> > > > >> >> > dev > > > >> >> > > > > >> >> > > > > >> >> > In carbondata version 1.2.0, I execute "update" statement with > > > >> >> sub-query, > > > >> >> > it failed. > > > >> >> > All the rows in the 2 tables are not duplicated, and the same > > > >> statement > > > >> >> > will succeed in carbondata version 1.1.1. > > > >> >> > > > > >> >> > The test log as following: > > > >> >> > scala> cc.sql("select count(*), count(distinct id) from > > > >> >> > qqdata2.c_indextest1").show(100,false); > > > >> >> > +--------+------------------+ > > > >> >> > |count(1)|count(DISTINCT id)| > > > >> >> > +--------+------------------+ > > > >> >> > |300000 |300000 | > > > >> >> > +--------+------------------+ > > > >> >> > > > > >> >> > scala> cc.sql("select count(*), count(distinct id) from > > > >> >> > qqdata2.c_indextest2").show(100,false); > > > >> >> > +--------+------------------+ > > > >> >> > |count(1)|count(DISTINCT id)| > > > >> >> > +--------+------------------+ > > > >> >> > |71223220|71223220 | > > > >> >> > +--------+------------------+ > > > >> >> > > > > >> >> > scala> cc.sql("update qqdata2.c_indextest2 a > > > >> >> set(a.CUST_ORDER_ID,a.ORDER_ > > > >> >> > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a. > > > >> >> > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a. > > > >> >> > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDE > > > >> R_ATTR_ID) > > > >> >> = > > > >> >> > (select b.CUST_ORDER_ID,b.ORDER_ITEM_ > > > IDATTR_ID,b.ATTR_VALUE_IDATTR_ > > > >> >> > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_ > > > >> >> > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b. > > > >> >> SHARDING_ID,b.ORDER_ATTR_ID > > > >> >> > from qqdata2.c_indextest1 b where a.id = b.id > > )").show(100,false); > > > >> >> > 17/10/18 11:32:46 WARN Utils: Truncated the string > representation > > > of > > > >> a > > > >> >> > plan since it was too large. This behavior can be adjusted by > > > setting > > > >> >> > 'spark.debug.maxToStringFields' in SparkEnv.conf. > > > >> >> > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn > > > >> >> > > > > >> >> > ][bigdata][Thread-1]Delete data operation is failed for > > > >> >> > qqdata2.c_indextest2 > > > >> >> > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data > > > operation > > > >> is > > > >> >> > failed due to failure in creating delete delta file for > segment : > > > >> null > > > >> >> > block : null > > > >> >> > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main > Exception > > in > > > >> >> update > > > >> >> > operationjava.lang.Exception: Multiple input rows matched for > > same > > > >> row. > > > >> >> > java.lang.RuntimeException: Update operation failed. Multiple > > input > > > >> >> rows > > > >> >> > matched for same row. > > > >> >> > at scala.sys.package$.error(package.scala:27) > > > >> >> > at org.apache.spark.sql.execution.command. > > > ProjectForUpdateCommand. > > > >> >> > processData(IUDCommands.scala:239) > > > >> >> > at > > > >> >> org.apache.spark.sql.execution.command. > ProjectForUpdateCommand.run( > > > >> >> > IUDCommands.scala:141) > > > >> >> > at org.apache.spark.sql.execution.command. > ExecutedCommandExec. > > > >> >> > sideEffectResult$lzycompute(commands.scala:58) > > > >> >> > at org.apache.spark.sql.execution.command. > ExecutedCommandExec. > > > >> >> > sideEffectResult(commands.scala:56) > > > >> >> > at org.apache.spark.sql.execution.command. > ExecutedCommandExec. > > > >> >> > executeTake(commands.scala:71) > > > >> >> > at org.apache.spark.sql.execution.CollectLimitExec. > > > >> >> > executeCollect(limit.scala:38) > > > >> >> > at org.apache.spark.sql.Dataset$$ > anonfun$org$apache$spark$sql$ > > > >> >> > Dataset$$execute$1$1.apply(Dataset.scala:2378) > > > >> >> > at org.apache.spark.sql.execution.SQLExecution$. > withNewExecutio > > > >> nId( > > > >> >> > SQLExecution.scala:57) > > > >> >> > at > > > >> >> org.apache.spark.sql.Dataset.withNewExecutionId(Dataset. > scala:2780) > > > >> >> > at org.apache.spark.sql.Dataset.org > > > >> >> > > > > >> >> > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377) > > > >> >> > at org.apache.spark.sql.Dataset.org > > > >> >> > > > > >> >> > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384) > > > >> >> > at org.apache.spark.sql.Dataset$$ > anonfun$head$1.apply(Dataset. > > > >> >> > scala:2120) > > > >> >> > at org.apache.spark.sql.Dataset$$ > anonfun$head$1.apply(Dataset. > > > >> >> > scala:2119) > > > >> >> > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset. > scala > > > >> :2810) > > > >> >> > at org.apache.spark.sql.Dataset.head(Dataset.scala:2119) > > > >> >> > at org.apache.spark.sql.Dataset.take(Dataset.scala:2334) > > > >> >> > at org.apache.spark.sql.Dataset. > showString(Dataset.scala:248) > > > >> >> > at org.apache.spark.sql.Dataset.show(Dataset.scala:640) > > > >> >> > ... 50 elided > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > yixu2001 > > > >> >> > > > > >> >> > > > >> >> > > > >> >> > > > >> >> -- > > > >> >> Thanks > > > >> >> Sounak > > > >> >> > > > >> > > > > >> > > > > >> > > > > >> > -- > > > >> > Thanks > > > >> > Sounak > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> -- > > > >> Sent from: > > http://apache-carbondata-dev-mailing-list-archive.1130556.n5 > > > >> .nabble.com/ > > > >> > > > > > > > > > > > > > > > > -- > > > > Thanks > > > > Sounak > > > > > > > > > > > > > > > > -- > > > Thanks > > > Sounak > > > > > > > > > > > -- > > Thanks > > Sounak > > > -- Thanks Sounak |
Free forum by Nabble | Edit this page |