Apache CarbonData Dev Mailing List archive

Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

Classic

List

Threaded

21 messages Options

yixu2001

Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

dev

In carbondata version 1.2.0, I execute "update" statement with sub-query, it failed.
All the rows in the 2 tables are not duplicated, and the same statement will succeed in carbondata version 1.1.1.

The test log as following:
scala> cc.sql("select count(*), count(distinct id) from qqdata2.c_indextest1").show(100,false);
+--------+------------------+
|count(1)|count(DISTINCT id)|
+--------+------------------+
|300000 |300000 |
+--------+------------------+

scala> cc.sql("select count(*), count(distinct id) from qqdata2.c_indextest2").show(100,false);
+--------+------------------+
|count(1)|count(DISTINCT id)|
+--------+------------------+
|71223220|71223220 |
+--------+------------------+

scala> cc.sql("update qqdata2.c_indextest2 a set(a.CUST_ORDER_ID,a.ORDER_ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a.UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a.REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDER_ATTR_ID) = (select b.CUST_ORDER_ID,b.ORDER_ITEM_IDATTR_ID,b.ATTR_VALUE_IDATTR_VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.SHARDING_ID,b.ORDER_ATTR_ID from qqdata2.c_indextest1 b where a.id = b.id)").show(100,false);
17/10/18 11:32:46 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn

][bigdata][Thread-1]Delete data operation is failed for qqdata2.c_indextest2
17/10/18 11:33:20 ERROR deleteExecution$: main Delete data operation is failed due to failure in creating delete delta file for segment : null block : null
17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main Exception in update operationjava.lang.Exception: Multiple input rows matched for same row.
java.lang.RuntimeException: Update operation failed. Multiple input rows matched for same row.
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.processData(IUDCommands.scala:239)
at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.run(IUDCommands.scala:141)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeTake(commands.scala:71)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2378)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780)
at org.apache.spark.sql.Dataset.org

$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
at org.apache.spark.sql.Dataset.org

$apache$spark$sql$Dataset$$collect(Dataset.scala:2384)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2120)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2119)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2810)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
at org.apache.spark.sql.Dataset.show(Dataset.scala:640)
... 50 elided

yixu2001

sounak

Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

Hi Yixu,

Can you please share the DDLs and the Data for the above problem with us?

Thanks
Sounak

On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <[hidden email]> wrote:

> dev
>
>
> In carbondata version 1.2.0, I execute "update" statement with sub-query,
> it failed.
> All the rows in the 2 tables are not duplicated, and the same statement
> will succeed in carbondata version 1.1.1.
>
> The test log as following:
> scala> cc.sql("select count(*), count(distinct id) from
> qqdata2.c_indextest1").show(100,false);
> +--------+------------------+
> |count(1)|count(DISTINCT id)|
> +--------+------------------+
> |300000 |300000 |
> +--------+------------------+
>
> scala> cc.sql("select count(*), count(distinct id) from
> qqdata2.c_indextest2").show(100,false);
> +--------+------------------+
> |count(1)|count(DISTINCT id)|
> +--------+------------------+
> |71223220|71223220 |
> +--------+------------------+
>
> scala> cc.sql("update qqdata2.c_indextest2 a set(a.CUST_ORDER_ID,a.ORDER_
> ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a.
> UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a.
> REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDER_ATTR_ID) =
> (select b.CUST_ORDER_ID,b.ORDER_ITEM_IDATTR_ID,b.ATTR_VALUE_IDATTR_
> VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_
> DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.SHARDING_ID,b.ORDER_ATTR_ID
> from qqdata2.c_indextest1 b where a.id = b.id)").show(100,false);
> 17/10/18 11:32:46 WARN Utils: Truncated the string representation of a
> plan since it was too large. This behavior can be adjusted by setting
> 'spark.debug.maxToStringFields' in SparkEnv.conf.
> 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn
>
> ][bigdata][Thread-1]Delete data operation is failed for
> qqdata2.c_indextest2
> 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data operation is
> failed due to failure in creating delete delta file for segment : null
> block : null
> 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main Exception in update
> operationjava.lang.Exception: Multiple input rows matched for same row.
> java.lang.RuntimeException: Update operation failed. Multiple input rows
> matched for same row.
> at scala.sys.package$.error(package.scala:27)
> at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.
> processData(IUDCommands.scala:239)
> at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.run(
> IUDCommands.scala:141)
> at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> sideEffectResult$lzycompute(commands.scala:58)
> at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> sideEffectResult(commands.scala:56)
> at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> executeTake(commands.scala:71)
> at org.apache.spark.sql.execution.CollectLimitExec.
> executeCollect(limit.scala:38)
> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$
> Dataset$$execute$1$1.apply(Dataset.scala:2378)
> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(
> SQLExecution.scala:57)
> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780)
> at org.apache.spark.sql.Dataset.org
>
> $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
> at org.apache.spark.sql.Dataset.org
>
> $apache$spark$sql$Dataset$$collect(Dataset.scala:2384)
> at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
> scala:2120)
> at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
> scala:2119)
> at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2810)
> at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
> at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
> at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:640)
> ... 50 elided
>
>
>
> yixu2001
>

--
Thanks
Sounak

yixu2001

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

dev

Step 1:I make a hive table qqdata2.h_indextest1:
CREATE EXTERNAL TABLE `qqdata2.h_indextest1`(
`id` INT,
`CUST_ORDER_ID` STRING,
`ORDER_ITEM_IDATTR_ID` STRING,
`ATTR_VALUE_IDATTR_VALUE` STRING,
`CREATE_DATE` STRING,
`UPDATE_DATE` STRING,
`STATUS_CD` STRING,
`STATUS_DATE` STRING,
`AREA_ID` STRING,
`REGION_CD` STRING,
`UPDATE_STAFF` STRING,
`CREATE_STAFF` STRING,
`SHARDING_ID` STRING,
`ORDER_ATTR_ID` STRING )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
LINES TERMINATED BY '\n'
NULL DEFINED AS ''
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io

.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1

';

there are csv files in hdfs dir user/bigdata/streamcql/dist1, the data format in the csv file just as following

1939505130,171483932,287305502,813463930,20160709134396669,201607101469099594,1299,20160711996575390,10,73,302063,302064,127859875,9999999

Step 2:I make a carbon table qqdata2.c_indextest1:
cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id STRING, CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID STRING,ATTR_VALUE_IDATTR_VALUE STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD STRING,STATUS_DATE STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF STRING,CREATE_STAFF STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY 'carbondata' ")

Step 3:Insert data:
cc.sql("insert into qqdata2.c_indextest1 select * from qqdata2.h_indextest1").show(100,false);

Step 4: Repeat from step1 to step 3, I make another carbon table qqdata2.c_indextest2
The record number of qqdata2.c_indextest1 is 30w, the record number of qqdata2.c_indextest2 is 1700w.

yixu2001

From: sounak
Date: 2017-10-19 14:13
To: dev
Subject: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
Hi Yixu,

Can you please share the DDLs and the Data for the above problem with us?

Thanks
Sounak

On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <[hidden email]> wrote:

--
Thanks
Sounak

sounak

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

Hi Yixu,

Thanks for the ddl. But i am particularly interested on the data in order
to reproduce the problem. Will it be feasible to share to data or if it is
huge then the scripts that generates it? I cannot access the data location
from public network.

On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 <[hidden email]> wrote:

> dev
>
> Step 1:I make a hive table qqdata2.h_indextest1:
> CREATE EXTERNAL TABLE `qqdata2.h_indextest1`(
> `id` INT,
> `CUST_ORDER_ID` STRING,
> `ORDER_ITEM_IDATTR_ID` STRING,
> `ATTR_VALUE_IDATTR_VALUE` STRING,
> `CREATE_DATE` STRING,
> `UPDATE_DATE` STRING,
> `STATUS_CD` STRING,
> `STATUS_DATE` STRING,
> `AREA_ID` STRING,
> `REGION_CD` STRING,
> `UPDATE_STAFF` STRING,
> `CREATE_STAFF` STRING,
> `SHARDING_ID` STRING,
> `ORDER_ATTR_ID` STRING )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\u0001'
> LINES TERMINATED BY '\n'
> NULL DEFINED AS ''
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io
>
> .HiveIgnoreKeyTextOutputFormat'
> LOCATION
> 'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1
>
> ';
>
> there are csv files in hdfs dir user/bigdata/streamcql/dist1, the data
> format in the csv file just as following
>
> 1939505130,171483932,287305502,813463930,20160709134396669,
> 201607101469099594,1299,20160711996575390,10,73,302063,302064,127859875,
> 9999999
>
> Step 2:I make a carbon table qqdata2.c_indextest1:
> cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id STRING,
> CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID STRING,ATTR_VALUE_IDATTR_VALUE
> STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD STRING,STATUS_DATE
> STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF STRING,CREATE_STAFF
> STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY 'carbondata' ")
>
> Step 3:Insert data:
> cc.sql("insert into qqdata2.c_indextest1 select * from
> qqdata2.h_indextest1").show(100,false);
>
> Step 4: Repeat from step1 to step 3, I make another carbon table
> qqdata2.c_indextest2
> The record number of qqdata2.c_indextest1 is 30w, the record number of
> qqdata2.c_indextest2 is 1700w.
>
>
>
> yixu2001
>
> From: sounak
> Date: 2017-10-19 14:13
> To: dev
> Subject: Re: Update statement failed with "Multiple input rows matched for
> same row" in version 1.2.0,
> Hi Yixu,
>
> Can you please share the DDLs and the Data for the above problem with us?
>
> Thanks
> Sounak
>
> On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <[hidden email]> wrote:
>
> > dev
> >
> >
> > In carbondata version 1.2.0, I execute "update" statement with sub-query,
> > it failed.
> > All the rows in the 2 tables are not duplicated, and the same statement
> > will succeed in carbondata version 1.1.1.
> >
> > The test log as following:
> > scala> cc.sql("select count(*), count(distinct id) from
> > qqdata2.c_indextest1").show(100,false);
> > +--------+------------------+
> > |count(1)|count(DISTINCT id)|
> > +--------+------------------+
> > |300000 |300000 |
> > +--------+------------------+
> >
> > scala> cc.sql("select count(*), count(distinct id) from
> > qqdata2.c_indextest2").show(100,false);
> > +--------+------------------+
> > |count(1)|count(DISTINCT id)|
> > +--------+------------------+
> > |71223220|71223220 |
> > +--------+------------------+
> >
> > scala> cc.sql("update qqdata2.c_indextest2 a set(a.CUST_ORDER_ID,a.ORDER_
> > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a.
> > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a.
> > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDER_ATTR_ID) =
> > (select b.CUST_ORDER_ID,b.ORDER_ITEM_IDATTR_ID,b.ATTR_VALUE_IDATTR_
> > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_
> > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.
> SHARDING_ID,b.ORDER_ATTR_ID
> > from qqdata2.c_indextest1 b where a.id = b.id)").show(100,false);
> > 17/10/18 11:32:46 WARN Utils: Truncated the string representation of a
> > plan since it was too large. This behavior can be adjusted by setting
> > 'spark.debug.maxToStringFields' in SparkEnv.conf.
> > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn
> >
> > ][bigdata][Thread-1]Delete data operation is failed for
> > qqdata2.c_indextest2
> > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data operation is
> > failed due to failure in creating delete delta file for segment : null
> > block : null
> > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main Exception in
> update
> > operationjava.lang.Exception: Multiple input rows matched for same row.
> > java.lang.RuntimeException: Update operation failed. Multiple input rows
> > matched for same row.
> > at scala.sys.package$.error(package.scala:27)
> > at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.
> > processData(IUDCommands.scala:239)
> > at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.run(
> > IUDCommands.scala:141)
> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> > sideEffectResult$lzycompute(commands.scala:58)
> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> > sideEffectResult(commands.scala:56)
> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> > executeTake(commands.scala:71)
> > at org.apache.spark.sql.execution.CollectLimitExec.
> > executeCollect(limit.scala:38)
> > at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$
> > Dataset$$execute$1$1.apply(Dataset.scala:2378)
> > at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(
> > SQLExecution.scala:57)
> > at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780)
> > at org.apache.spark.sql.Dataset.org
> >
> > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
> > at org.apache.spark.sql.Dataset.org
> >
> > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384)
> > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
> > scala:2120)
> > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
> > scala:2119)
> > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2810)
> > at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
> > at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
> > at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
> > at org.apache.spark.sql.Dataset.show(Dataset.scala:640)
> > ... 50 elided
> >
> >
> >
> > yixu2001
> >
>
>
>
> --
> Thanks
> Sounak
>

--
Thanks
Sounak

yixu2001

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

dev
You can follow the steps below to reproduce the problem.
tables c_indextest2 has 1700w records and table c_indextest1 has about 30w records.

step 1:
cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark string) STORED BY 'carbondata'").show;

step 2: origin_data is a existing table with 1700w records, and fields in the table do not matter.
cc.sql("insert into c_indextest2 select row_number() over(partition by a.PKid order by a.pkid) id,a.remark from (SELECT '1' PKID,'dfsdd' remark from origin_data limit 17000000) a").show;

step 3:
cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark string) STORED BY 'carbondata'").show;

step 4:
cc.sql("insert into c_indextest1 select * from c_indextest2 where pmod(cast(ID as int),50)=43").show;

setp 5:
cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from c_indextest1 a where a.id=b.id)").show;

yixu2001

From: sounak
Date: 2017-10-19 18:26
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
Hi Yixu,

Thanks for the ddl. But i am particularly interested on the data in order
to reproduce the problem. Will it be feasible to share to data or if it is
huge then the scripts that generates it? I cannot access the data location
from public network.

On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 <[hidden email]> wrote:

--
Thanks
Sounak

Liang Chen

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

Administrator

Hi

Execute the below query, return one row record or multiple row records ?
---------------------------------
select a.remark from c_indextest1 a where a.id=b.id

Regards
Liang

yixu2001 wrote

> dev
> You can follow the steps below to reproduce the problem.
> tables c_indextest2 has 1700w records and table c_indextest1 has about 30w
> records.
>
> step 1:
> cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark string)
> STORED BY 'carbondata'").show;
>
> step 2: origin_data is a existing table with 1700w records, and fields in
> the table do not matter.
> cc.sql("insert into c_indextest2 select row_number() over(partition by
> a.PKid order by a.pkid) id,a.remark from (SELECT '1' PKID,'dfsdd' remark
> from origin_data limit 17000000) a").show;
>
> step 3:
> cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark string)
> STORED BY 'carbondata'").show;
>
> step 4:
> cc.sql("insert into c_indextest1 select * from c_indextest2 where
> pmod(cast(ID as int),50)=43").show;
>
> setp 5:
> cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from
> c_indextest1 a where a.id=b.id)").show;
>
>
> yixu2001
>
> From: sounak
> Date: 2017-10-19 18:26
> To: dev
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> Hi Yixu,
>
> Thanks for the ddl. But i am particularly interested on the data in order
> to reproduce the problem. Will it be feasible to share to data or if it is
> huge then the scripts that generates it? I cannot access the data location
> from public network.
>
>
> On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 <

> yixu2001@

> > wrote:
>
>> dev
>>
>> Step 1:I make a hive table qqdata2.h_indextest1:
>> CREATE EXTERNAL TABLE `qqdata2.h_indextest1`(
>> `id` INT,
>> `CUST_ORDER_ID` STRING,
>> `ORDER_ITEM_IDATTR_ID` STRING,
>> `ATTR_VALUE_IDATTR_VALUE` STRING,
>> `CREATE_DATE` STRING,
>> `UPDATE_DATE` STRING,
>> `STATUS_CD` STRING,
>> `STATUS_DATE` STRING,
>> `AREA_ID` STRING,
>> `REGION_CD` STRING,
>> `UPDATE_STAFF` STRING,
>> `CREATE_STAFF` STRING,
>> `SHARDING_ID` STRING,
>> `ORDER_ATTR_ID` STRING )
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY '\u0001'
>> LINES TERMINATED BY '\n'
>> NULL DEFINED AS ''
>> STORED AS INPUTFORMAT
>> 'org.apache.hadoop.mapred.TextInputFormat'
>> OUTPUTFORMAT
>> 'org.apache.hadoop.hive.ql.io
>>
>> .HiveIgnoreKeyTextOutputFormat'
>> LOCATION
>> 'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1
>>
>> ';
>>
>> there are csv files in hdfs dir user/bigdata/streamcql/dist1, the data
>> format in the csv file just as following
>>
>> 1939505130,171483932,287305502,813463930,20160709134396669,
>> 201607101469099594,1299,20160711996575390,10,73,302063,302064,127859875,
>> 9999999
>>
>> Step 2:I make a carbon table qqdata2.c_indextest1:
>> cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id STRING,
>> CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID STRING,ATTR_VALUE_IDATTR_VALUE
>> STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD STRING,STATUS_DATE
>> STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF STRING,CREATE_STAFF
>> STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY 'carbondata' ")
>>
>> Step 3:Insert data:
>> cc.sql("insert into qqdata2.c_indextest1 select * from
>> qqdata2.h_indextest1").show(100,false);
>>
>> Step 4: Repeat from step1 to step 3, I make another carbon table
>> qqdata2.c_indextest2
>> The record number of qqdata2.c_indextest1 is 30w, the record number of
>> qqdata2.c_indextest2 is 1700w.
>>
>>
>>
>> yixu2001
>>
>> From: sounak
>> Date: 2017-10-19 14:13
>> To: dev
>> Subject: Re: Update statement failed with "Multiple input rows matched
>> for
>> same row" in version 1.2.0,
>> Hi Yixu,
>>
>> Can you please share the DDLs and the Data for the above problem with us?
>>
>> Thanks
>> Sounak
>>
>> On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <

> yixu2001@

> > wrote:
>>
>> > dev
>> >
>> >
>> > In carbondata version 1.2.0, I execute "update" statement with
>> sub-query,
>> > it failed.
>> > All the rows in the 2 tables are not duplicated, and the same statement
>> > will succeed in carbondata version 1.1.1.
>> >
>> > The test log as following:
>> > scala> cc.sql("select count(*), count(distinct id) from
>> > qqdata2.c_indextest1").show(100,false);
>> > +--------+------------------+
>> > |count(1)|count(DISTINCT id)|
>> > +--------+------------------+
>> > |300000 |300000 |
>> > +--------+------------------+
>> >
>> > scala> cc.sql("select count(*), count(distinct id) from
>> > qqdata2.c_indextest2").show(100,false);
>> > +--------+------------------+
>> > |count(1)|count(DISTINCT id)|
>> > +--------+------------------+
>> > |71223220|71223220 |
>> > +--------+------------------+
>> >
>> > scala> cc.sql("update qqdata2.c_indextest2 a
>> set(a.CUST_ORDER_ID,a.ORDER_
>> > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a.
>> > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a.
>> > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDER_ATTR_ID)
>> =
>> > (select b.CUST_ORDER_ID,b.ORDER_ITEM_IDATTR_ID,b.ATTR_VALUE_IDATTR_
>> > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_
>> > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.
>> SHARDING_ID,b.ORDER_ATTR_ID
>> > from qqdata2.c_indextest1 b where a.id = b.id)").show(100,false);
>> > 17/10/18 11:32:46 WARN Utils: Truncated the string representation of a
>> > plan since it was too large. This behavior can be adjusted by setting
>> > 'spark.debug.maxToStringFields' in SparkEnv.conf.
>> > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn
>> >
>> > ][bigdata][Thread-1]Delete data operation is failed for
>> > qqdata2.c_indextest2
>> > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data operation is
>> > failed due to failure in creating delete delta file for segment : null
>> > block : null
>> > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main Exception in
>> update
>> > operationjava.lang.Exception: Multiple input rows matched for same row.
>> > java.lang.RuntimeException: Update operation failed. Multiple input
>> rows
>> > matched for same row.
>> > at scala.sys.package$.error(package.scala:27)
>> > at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.
>> > processData(IUDCommands.scala:239)
>> > at
>> org.apache.spark.sql.execution.command.ProjectForUpdateCommand.run(
>> > IUDCommands.scala:141)
>> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> > sideEffectResult$lzycompute(commands.scala:58)
>> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> > sideEffectResult(commands.scala:56)
>> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> > executeTake(commands.scala:71)
>> > at org.apache.spark.sql.execution.CollectLimitExec.
>> > executeCollect(limit.scala:38)
>> > at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$
>> > Dataset$$execute$1$1.apply(Dataset.scala:2378)
>> > at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(
>> > SQLExecution.scala:57)
>> > at
>> org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780)
>> > at org.apache.spark.sql.Dataset.org
>> >
>> > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
>> > at org.apache.spark.sql.Dataset.org
>> >
>> > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384)
>> > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
>> > scala:2120)
>> > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
>> > scala:2119)
>> > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2810)
>> > at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
>> > at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
>> > at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
>> > at org.apache.spark.sql.Dataset.show(Dataset.scala:640)
>> > ... 50 elided
>> >
>> >
>> >
>> > yixu2001
>> >
>>
>>
>>
>> --
>> Thanks
>> Sounak
>>
>
>
>
> --
> Thanks
> Sounak

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

yixu2001

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

dev
one row record

yixu2001

From: Liang Chen
Date: 2017-10-19 22:26
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
Hi

Execute the below query, return one row record or multiple row records ?
---------------------------------
select a.remark from c_indextest1 a where a.id=b.id

Regards
Liang

yixu2001 wrote

> yixu2001@

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

sounak

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

Internally, we call a UDF to generate the TupleId and based on that tupleId
we decide if the row is duplicated or not. You can run a slightly tweaked
query

" select a.remark,a.id, getTupleId() as TupleId from c_indextest1 a,
c_indextest2
b where a.id=b.id; "

In order to prompt Multiple Input Rows most probably the getTupleId() will
give duplicate entries.

In between your given query runs fine with the data i have generated in
house. So this is data dependent. So it will be good if we get *origin_data*
or the script which has build the data for origin_data.

On Fri, Oct 20, 2017 at 7:28 AM, yixu2001 <[hidden email]> wrote:

> dev
> one row record
>
>
> yixu2001
>
> From: Liang Chen
> Date: 2017-10-19 22:26
> To: dev
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> Hi
>
> Execute the below query, return one row record or multiple row records ?
> ---------------------------------
> select a.remark from c_indextest1 a where a.id=b.id
>
> Regards
> Liang
>
>
> yixu2001 wrote
> > dev
> > You can follow the steps below to reproduce the problem.
> > tables c_indextest2 has 1700w records and table c_indextest1 has about
> 30w
> > records.
> >
> > step 1:
> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark
> string)
> > STORED BY 'carbondata'").show;
> >
> > step 2: origin_data is a existing table with 1700w records, and fields in
> > the table do not matter.
> > cc.sql("insert into c_indextest2 select row_number() over(partition by
> > a.PKid order by a.pkid) id,a.remark from (SELECT '1' PKID,'dfsdd' remark
> > from origin_data limit 17000000) a").show;
> >
> > step 3:
> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark
> string)
> > STORED BY 'carbondata'").show;
> >
> > step 4:
> > cc.sql("insert into c_indextest1 select * from c_indextest2 where
> > pmod(cast(ID as int),50)=43").show;
> >
> > setp 5:
> > cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from
> > c_indextest1 a where a.id=b.id)").show;
> >
> >
> > yixu2001
> >
> > From: sounak
> > Date: 2017-10-19 18:26
> > To: dev
> > Subject: Re: Re: Update statement failed with "Multiple input rows
> matched
> > for same row" in version 1.2.0,
> > Hi Yixu,
> >
> > Thanks for the ddl. But i am particularly interested on the data in order
> > to reproduce the problem. Will it be feasible to share to data or if it
> is
> > huge then the scripts that generates it? I cannot access the data
> location
> > from public network.
> >
> >
> > On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 <
>
> > yixu2001@
>
> > > wrote:
> >
> >> dev
> >>
> >> Step 1:I make a hive table qqdata2.h_indextest1:
> >> CREATE EXTERNAL TABLE `qqdata2.h_indextest1`(
> >> `id` INT,
> >> `CUST_ORDER_ID` STRING,
> >> `ORDER_ITEM_IDATTR_ID` STRING,
> >> `ATTR_VALUE_IDATTR_VALUE` STRING,
> >> `CREATE_DATE` STRING,
> >> `UPDATE_DATE` STRING,
> >> `STATUS_CD` STRING,
> >> `STATUS_DATE` STRING,
> >> `AREA_ID` STRING,
> >> `REGION_CD` STRING,
> >> `UPDATE_STAFF` STRING,
> >> `CREATE_STAFF` STRING,
> >> `SHARDING_ID` STRING,
> >> `ORDER_ATTR_ID` STRING )
> >> ROW FORMAT DELIMITED
> >> FIELDS TERMINATED BY '\u0001'
> >> LINES TERMINATED BY '\n'
> >> NULL DEFINED AS ''
> >> STORED AS INPUTFORMAT
> >> 'org.apache.hadoop.mapred.TextInputFormat'
> >> OUTPUTFORMAT
> >> 'org.apache.hadoop.hive.ql.io
> >>
> >> .HiveIgnoreKeyTextOutputFormat'
> >> LOCATION
> >> 'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1
> >>
> >> ';
> >>
> >> there are csv files in hdfs dir user/bigdata/streamcql/dist1, the data
> >> format in the csv file just as following
> >>
> >> 1939505130,171483932,287305502,813463930,20160709134396669,
> >> 201607101469099594,1299,20160711996575390,10,73,
> 302063,302064,127859875,
> >> 9999999
> >>
> >> Step 2:I make a carbon table qqdata2.c_indextest1:
> >> cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id STRING,
> >> CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID STRING,ATTR_VALUE_IDATTR_VALUE
> >> STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD
> STRING,STATUS_DATE
> >> STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF STRING,CREATE_STAFF
> >> STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY 'carbondata'
> ")
> >>
> >> Step 3:Insert data:
> >> cc.sql("insert into qqdata2.c_indextest1 select * from
> >> qqdata2.h_indextest1").show(100,false);
> >>
> >> Step 4: Repeat from step1 to step 3, I make another carbon table
> >> qqdata2.c_indextest2
> >> The record number of qqdata2.c_indextest1 is 30w, the record number of
> >> qqdata2.c_indextest2 is 1700w.
> >>
> >>
> >>
> >> yixu2001
> >>
> >> From: sounak
> >> Date: 2017-10-19 14:13
> >> To: dev
> >> Subject: Re: Update statement failed with "Multiple input rows matched
> >> for
> >> same row" in version 1.2.0,
> >> Hi Yixu,
> >>
> >> Can you please share the DDLs and the Data for the above problem with
> us?
> >>
> >> Thanks
> >> Sounak
> >>
> >> On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <
>
> > yixu2001@
>
> > > wrote:
> >>
> >> > dev
> >> >
> >> >
> >> > In carbondata version 1.2.0, I execute "update" statement with
> >> sub-query,
> >> > it failed.
> >> > All the rows in the 2 tables are not duplicated, and the same
> statement
> >> > will succeed in carbondata version 1.1.1.
> >> >
> >> > The test log as following:
> >> > scala> cc.sql("select count(*), count(distinct id) from
> >> > qqdata2.c_indextest1").show(100,false);
> >> > +--------+------------------+
> >> > |count(1)|count(DISTINCT id)|
> >> > +--------+------------------+
> >> > |300000 |300000 |
> >> > +--------+------------------+
> >> >
> >> > scala> cc.sql("select count(*), count(distinct id) from
> >> > qqdata2.c_indextest2").show(100,false);
> >> > +--------+------------------+
> >> > |count(1)|count(DISTINCT id)|
> >> > +--------+------------------+
> >> > |71223220|71223220 |
> >> > +--------+------------------+
> >> >
> >> > scala> cc.sql("update qqdata2.c_indextest2 a
> >> set(a.CUST_ORDER_ID,a.ORDER_
> >> > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a.
> >> > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a.
> >> > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.
> ORDER_ATTR_ID)
> >> =
> >> > (select b.CUST_ORDER_ID,b.ORDER_ITEM_IDATTR_ID,b.ATTR_VALUE_IDATTR_
> >> > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_
> >> > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.
> >> SHARDING_ID,b.ORDER_ATTR_ID
> >> > from qqdata2.c_indextest1 b where a.id = b.id)").show(100,false);
> >> > 17/10/18 11:32:46 WARN Utils: Truncated the string representation of a
> >> > plan since it was too large. This behavior can be adjusted by setting
> >> > 'spark.debug.maxToStringFields' in SparkEnv.conf.
> >> > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn
> >> >
> >> > ][bigdata][Thread-1]Delete data operation is failed for
> >> > qqdata2.c_indextest2
> >> > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data operation
> is
> >> > failed due to failure in creating delete delta file for segment : null
> >> > block : null
> >> > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main Exception in
> >> update
> >> > operationjava.lang.Exception: Multiple input rows matched for same
> row.
> >> > java.lang.RuntimeException: Update operation failed. Multiple input
> >> rows
> >> > matched for same row.
> >> > at scala.sys.package$.error(package.scala:27)
> >> > at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.
> >> > processData(IUDCommands.scala:239)
> >> > at
> >> org.apache.spark.sql.execution.command.ProjectForUpdateCommand.run(
> >> > IUDCommands.scala:141)
> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> >> > sideEffectResult$lzycompute(commands.scala:58)
> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> >> > sideEffectResult(commands.scala:56)
> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> >> > executeTake(commands.scala:71)
> >> > at org.apache.spark.sql.execution.CollectLimitExec.
> >> > executeCollect(limit.scala:38)
> >> > at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$
> >> > Dataset$$execute$1$1.apply(Dataset.scala:2378)
> >> > at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(
> >> > SQLExecution.scala:57)
> >> > at
> >> org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780)
> >> > at org.apache.spark.sql.Dataset.org
> >> >
> >> > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
> >> > at org.apache.spark.sql.Dataset.org
> >> >
> >> > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384)
> >> > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
> >> > scala:2120)
> >> > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
> >> > scala:2119)
> >> > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.
> scala:2810)
> >> > at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
> >> > at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
> >> > at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
> >> > at org.apache.spark.sql.Dataset.show(Dataset.scala:640)
> >> > ... 50 elided
> >> >
> >> >
> >> >
> >> > yixu2001
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks
> >> Sounak
> >>
> >
> >
> >
> > --
> > Thanks
> > Sounak
>
>
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>

--
Thanks
Sounak

sounak

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

Slight rectification in typo of the query syntax

" select a.remark,a.id, getTupleId() as *tupleId* from c_indextest1 a,
c_indextest2
b where a.id=b.id;"

On Fri, Oct 20, 2017 at 7:54 AM, sounak <[hidden email]> wrote:

> Internally, we call a UDF to generate the TupleId and based on that
> tupleId we decide if the row is duplicated or not. You can run a slightly
> tweaked query
>
> " select a.remark,a.id, getTupleId() as TupleId from c_indextest1 a, c_indextest2
> b where a.id=b.id; "
>
> In order to prompt Multiple Input Rows most probably the getTupleId() will
> give duplicate entries.
>
> In between your given query runs fine with the data i have generated in
> house. So this is data dependent. So it will be good if we get
> *origin_data* or the script which has build the data for origin_data.
>
> On Fri, Oct 20, 2017 at 7:28 AM, yixu2001 <[hidden email]> wrote:
>
>> dev
>> one row record
>>
>>
>> yixu2001
>>
>> From: Liang Chen
>> Date: 2017-10-19 22:26
>> To: dev
>> Subject: Re: Re: Update statement failed with "Multiple input rows
>> matched for same row" in version 1.2.0,
>> Hi
>>
>> Execute the below query, return one row record or multiple row records ?
>> ---------------------------------
>> select a.remark from c_indextest1 a where a.id=b.id
>>
>> Regards
>> Liang
>>
>>
>> yixu2001 wrote
>> > dev
>> > You can follow the steps below to reproduce the problem.
>> > tables c_indextest2 has 1700w records and table c_indextest1 has about
>> 30w
>> > records.
>> >
>> > step 1:
>> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark
>> string)
>> > STORED BY 'carbondata'").show;
>> >
>> > step 2: origin_data is a existing table with 1700w records, and fields
>> in
>> > the table do not matter.
>> > cc.sql("insert into c_indextest2 select row_number() over(partition by
>> > a.PKid order by a.pkid) id,a.remark from (SELECT '1' PKID,'dfsdd'
>> remark
>> > from origin_data limit 17000000) a").show;
>> >
>> > step 3:
>> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark
>> string)
>> > STORED BY 'carbondata'").show;
>> >
>> > step 4:
>> > cc.sql("insert into c_indextest1 select * from c_indextest2 where
>> > pmod(cast(ID as int),50)=43").show;
>> >
>> > setp 5:
>> > cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from
>> > c_indextest1 a where a.id=b.id)").show;
>> >
>> >
>> > yixu2001
>> >
>> > From: sounak
>> > Date: 2017-10-19 18:26
>> > To: dev
>> > Subject: Re: Re: Update statement failed with "Multiple input rows
>> matched
>> > for same row" in version 1.2.0,
>> > Hi Yixu,
>> >
>> > Thanks for the ddl. But i am particularly interested on the data in
>> order
>> > to reproduce the problem. Will it be feasible to share to data or if it
>> is
>> > huge then the scripts that generates it? I cannot access the data
>> location
>> > from public network.
>> >
>> >
>> > On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 <
>>
>> > yixu2001@
>>
>> > > wrote:
>> >
>> >> dev
>> >>
>> >> Step 1:I make a hive table qqdata2.h_indextest1:
>> >> CREATE EXTERNAL TABLE `qqdata2.h_indextest1`(
>> >> `id` INT,
>> >> `CUST_ORDER_ID` STRING,
>> >> `ORDER_ITEM_IDATTR_ID` STRING,
>> >> `ATTR_VALUE_IDATTR_VALUE` STRING,
>> >> `CREATE_DATE` STRING,
>> >> `UPDATE_DATE` STRING,
>> >> `STATUS_CD` STRING,
>> >> `STATUS_DATE` STRING,
>> >> `AREA_ID` STRING,
>> >> `REGION_CD` STRING,
>> >> `UPDATE_STAFF` STRING,
>> >> `CREATE_STAFF` STRING,
>> >> `SHARDING_ID` STRING,
>> >> `ORDER_ATTR_ID` STRING )
>> >> ROW FORMAT DELIMITED
>> >> FIELDS TERMINATED BY '\u0001'
>> >> LINES TERMINATED BY '\n'
>> >> NULL DEFINED AS ''
>> >> STORED AS INPUTFORMAT
>> >> 'org.apache.hadoop.mapred.TextInputFormat'
>> >> OUTPUTFORMAT
>> >> 'org.apache.hadoop.hive.ql.io
>> >>
>> >> .HiveIgnoreKeyTextOutputFormat'
>> >> LOCATION
>> >> 'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1
>> >>
>> >> ';
>> >>
>> >> there are csv files in hdfs dir user/bigdata/streamcql/dist1, the data
>> >> format in the csv file just as following
>> >>
>> >> 1939505130,171483932,287305502,813463930,20160709134396669,
>> >> 201607101469099594,1299,20160711996575390,10,73,302063,
>> 302064,127859875,
>> >> 9999999
>> >>
>> >> Step 2:I make a carbon table qqdata2.c_indextest1:
>> >> cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id STRING,
>> >> CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID
>> STRING,ATTR_VALUE_IDATTR_VALUE
>> >> STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD
>> STRING,STATUS_DATE
>> >> STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF STRING,CREATE_STAFF
>> >> STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY 'carbondata'
>> ")
>> >>
>> >> Step 3:Insert data:
>> >> cc.sql("insert into qqdata2.c_indextest1 select * from
>> >> qqdata2.h_indextest1").show(100,false);
>> >>
>> >> Step 4: Repeat from step1 to step 3, I make another carbon table
>> >> qqdata2.c_indextest2
>> >> The record number of qqdata2.c_indextest1 is 30w, the record number of
>> >> qqdata2.c_indextest2 is 1700w.
>> >>
>> >>
>> >>
>> >> yixu2001
>> >>
>> >> From: sounak
>> >> Date: 2017-10-19 14:13
>> >> To: dev
>> >> Subject: Re: Update statement failed with "Multiple input rows matched
>> >> for
>> >> same row" in version 1.2.0,
>> >> Hi Yixu,
>> >>
>> >> Can you please share the DDLs and the Data for the above problem with
>> us?
>> >>
>> >> Thanks
>> >> Sounak
>> >>
>> >> On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <
>>
>> > yixu2001@
>>
>> > > wrote:
>> >>
>> >> > dev
>> >> >
>> >> >
>> >> > In carbondata version 1.2.0, I execute "update" statement with
>> >> sub-query,
>> >> > it failed.
>> >> > All the rows in the 2 tables are not duplicated, and the same
>> statement
>> >> > will succeed in carbondata version 1.1.1.
>> >> >
>> >> > The test log as following:
>> >> > scala> cc.sql("select count(*), count(distinct id) from
>> >> > qqdata2.c_indextest1").show(100,false);
>> >> > +--------+------------------+
>> >> > |count(1)|count(DISTINCT id)|
>> >> > +--------+------------------+
>> >> > |300000 |300000 |
>> >> > +--------+------------------+
>> >> >
>> >> > scala> cc.sql("select count(*), count(distinct id) from
>> >> > qqdata2.c_indextest2").show(100,false);
>> >> > +--------+------------------+
>> >> > |count(1)|count(DISTINCT id)|
>> >> > +--------+------------------+
>> >> > |71223220|71223220 |
>> >> > +--------+------------------+
>> >> >
>> >> > scala> cc.sql("update qqdata2.c_indextest2 a
>> >> set(a.CUST_ORDER_ID,a.ORDER_
>> >> > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a.
>> >> > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a.
>> >> > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDE
>> R_ATTR_ID)
>> >> =
>> >> > (select b.CUST_ORDER_ID,b.ORDER_ITEM_IDATTR_ID,b.ATTR_VALUE_IDATTR_
>> >> > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_
>> >> > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.
>> >> SHARDING_ID,b.ORDER_ATTR_ID
>> >> > from qqdata2.c_indextest1 b where a.id = b.id)").show(100,false);
>> >> > 17/10/18 11:32:46 WARN Utils: Truncated the string representation of
>> a
>> >> > plan since it was too large. This behavior can be adjusted by setting
>> >> > 'spark.debug.maxToStringFields' in SparkEnv.conf.
>> >> > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn
>> >> >
>> >> > ][bigdata][Thread-1]Delete data operation is failed for
>> >> > qqdata2.c_indextest2
>> >> > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data operation
>> is
>> >> > failed due to failure in creating delete delta file for segment :
>> null
>> >> > block : null
>> >> > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main Exception in
>> >> update
>> >> > operationjava.lang.Exception: Multiple input rows matched for same
>> row.
>> >> > java.lang.RuntimeException: Update operation failed. Multiple input
>> >> rows
>> >> > matched for same row.
>> >> > at scala.sys.package$.error(package.scala:27)
>> >> > at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.
>> >> > processData(IUDCommands.scala:239)
>> >> > at
>> >> org.apache.spark.sql.execution.command.ProjectForUpdateCommand.run(
>> >> > IUDCommands.scala:141)
>> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> >> > sideEffectResult$lzycompute(commands.scala:58)
>> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> >> > sideEffectResult(commands.scala:56)
>> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> >> > executeTake(commands.scala:71)
>> >> > at org.apache.spark.sql.execution.CollectLimitExec.
>> >> > executeCollect(limit.scala:38)
>> >> > at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$
>> >> > Dataset$$execute$1$1.apply(Dataset.scala:2378)
>> >> > at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>> nId(
>> >> > SQLExecution.scala:57)
>> >> > at
>> >> org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780)
>> >> > at org.apache.spark.sql.Dataset.org
>> >> >
>> >> > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
>> >> > at org.apache.spark.sql.Dataset.org
>> >> >
>> >> > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384)
>> >> > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
>> >> > scala:2120)
>> >> > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
>> >> > scala:2119)
>> >> > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala
>> :2810)
>> >> > at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
>> >> > at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
>> >> > at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
>> >> > at org.apache.spark.sql.Dataset.show(Dataset.scala:640)
>> >> > ... 50 elided
>> >> >
>> >> >
>> >> >
>> >> > yixu2001
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Thanks
>> >> Sounak
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks
>> > Sounak
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5
>> .nabble.com/
>>
>
>
>
> --
> Thanks
> Sounak
>

--
Thanks
Sounak

yixu2001

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

dev

a.id value to b.id value one to one
remark Multiple Input Rows
scala> cc.sql("select count(a.remark),count(distinct a.remark),count(a.id),count(distinct a.id),count(b.id),count(distinct b.id) from c_indextest1 a,c_indextest2 b where a.id=b.id").show;
+-------------+----------------------+---------+------------------+---------+------------------+
|count(remark)| count(DISTINCT remark)| count(a.id) |count(DISTINCT a.id)|count(b.id) | count(DISTINCT b.id)|
+-------------+----------------------+---------+------------------+---------+------------------+
| 340000| 1 | 340000 | 340000 | 340000 | 340000 |
+-------------+----------------------+---------+------------------+---------+------------------+

scala> cc.sql("select count(b.id),count(distinct b.id),count(b.remark),count(distinct b.remark) from c_indextest2 b").show;
+---------+------------------+-------------+----------------------+
|count(id)|count(DISTINCT id)|count(remark)|count(DISTINCT remark)|
+---------+------------------+-------------+----------------------+
| 17000000| 17000000| 17000000| 1|
+---------+------------------+-------------+----------------------+

scala> cc.sql("select count(b.id),count(distinct b.id),count(b.remark),count(distinct b.remark) from c_indextest1 b").show;
+---------+------------------+-------------+----------------------+
|count(id)|count(DISTINCT id)|count(remark)|count(DISTINCT remark)|
+---------+------------------+-------------+----------------------+
| 340000| 340000| 340000| 1|
+---------+------------------+-------------+----------------------+

a.id value to b.id value one to one
a.remark update b.remark

cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from c_indextest1 a where a.id=b.id)").show;
a.id value to b.id value one to one

yixu2001

From: sounak
Date: 2017-10-20 10:57
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
Slight rectification in typo of the query syntax

" select a.remark,a.id, getTupleId() as *tupleId* from c_indextest1 a,
c_indextest2
b where a.id=b.id;"

On Fri, Oct 20, 2017 at 7:54 AM, sounak <[hidden email]> wrote:

--
Thanks
Sounak

yixu2001

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

In reply to this post by Liang Chen

dev
Sorry, I have made a mistake.

Execute the below query, it return multiple row records.

select a.remark from c_indextest1 a,c_indextest2 b where a.id=b.id

yixu2001

From: Liang Chen
Date: 2017-10-19 22:26
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
Hi

Execute the below query, return one row record or multiple row records ?
---------------------------------
select a.remark from c_indextest1 a where a.id=b.id

Regards
Liang

yixu2001 wrote

> yixu2001@

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

yixu2001

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

In reply to this post by sounak

dev

scala> cc.sql(" select c.tupleId,count(*) from (select a.remark,a.id, getTupleId() as tupleId from c_indextest1 a,c_indextest2 b where a.id=b.id) c group by c.tupleId having count(*)>1").show;
+-------+--------+
|tupleId|count(1)|
+-------+--------+
+-------+--------+

no Multiple input rows

yixu2001

From: sounak
Date: 2017-10-20 10:57
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
Slight rectification in typo of the query syntax

" select a.remark,a.id, getTupleId() as *tupleId* from c_indextest1 a,
c_indextest2
b where a.id=b.id;"

On Fri, Oct 20, 2017 at 7:54 AM, sounak <[hidden email]> wrote:

--
Thanks
Sounak

sounak

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

As the tupleIds are also unique, the only option left is to get hold of
your data and reproducing it in-house. Is it possible to generate your data
in-house or get it?

On Fri, Oct 20, 2017 at 12:14 PM, yixu2001 <[hidden email]> wrote:

> dev
>
> scala> cc.sql(" select c.tupleId,count(*) from (select a.remark,a.id,
> getTupleId() as tupleId from c_indextest1 a,c_indextest2 b where a.id=b.id)
> c group by c.tupleId having count(*)>1").show;
> +-------+--------+
> |tupleId|count(1)|
> +-------+--------+
> +-------+--------+
>
> no Multiple input rows
>
>
> yixu2001
>
> From: sounak
> Date: 2017-10-20 10:57
> To: dev
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> Slight rectification in typo of the query syntax
>
> " select a.remark,a.id, getTupleId() as *tupleId* from c_indextest1 a,
> c_indextest2
> b where a.id=b.id;"
>
> On Fri, Oct 20, 2017 at 7:54 AM, sounak <[hidden email]> wrote:
>
> > Internally, we call a UDF to generate the TupleId and based on that
> > tupleId we decide if the row is duplicated or not. You can run a slightly
> > tweaked query
> >
> > " select a.remark,a.id, getTupleId() as TupleId from c_indextest1 a,
> c_indextest2
> > b where a.id=b.id; "
> >
> > In order to prompt Multiple Input Rows most probably the getTupleId()
> will
> > give duplicate entries.
> >
> > In between your given query runs fine with the data i have generated in
> > house. So this is data dependent. So it will be good if we get
> > *origin_data* or the script which has build the data for origin_data.
> >
> > On Fri, Oct 20, 2017 at 7:28 AM, yixu2001 <[hidden email]> wrote:
> >
> >> dev
> >> one row record
> >>
> >>
> >> yixu2001
> >>
> >> From: Liang Chen
> >> Date: 2017-10-19 22:26
> >> To: dev
> >> Subject: Re: Re: Update statement failed with "Multiple input rows
> >> matched for same row" in version 1.2.0,
> >> Hi
> >>
> >> Execute the below query, return one row record or multiple row records ?
> >> ---------------------------------
> >> select a.remark from c_indextest1 a where a.id=b.id
> >>
> >> Regards
> >> Liang
> >>
> >>
> >> yixu2001 wrote
> >> > dev
> >> > You can follow the steps below to reproduce the problem.
> >> > tables c_indextest2 has 1700w records and table c_indextest1 has about
> >> 30w
> >> > records.
> >> >
> >> > step 1:
> >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark
> >> string)
> >> > STORED BY 'carbondata'").show;
> >> >
> >> > step 2: origin_data is a existing table with 1700w records, and fields
> >> in
> >> > the table do not matter.
> >> > cc.sql("insert into c_indextest2 select row_number() over(partition by
> >> > a.PKid order by a.pkid) id,a.remark from (SELECT '1' PKID,'dfsdd'
> >> remark
> >> > from origin_data limit 17000000) a").show;
> >> >
> >> > step 3:
> >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark
> >> string)
> >> > STORED BY 'carbondata'").show;
> >> >
> >> > step 4:
> >> > cc.sql("insert into c_indextest1 select * from c_indextest2 where
> >> > pmod(cast(ID as int),50)=43").show;
> >> >
> >> > setp 5:
> >> > cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from
> >> > c_indextest1 a where a.id=b.id)").show;
> >> >
> >> >
> >> > yixu2001
> >> >
> >> > From: sounak
> >> > Date: 2017-10-19 18:26
> >> > To: dev
> >> > Subject: Re: Re: Update statement failed with "Multiple input rows
> >> matched
> >> > for same row" in version 1.2.0,
> >> > Hi Yixu,
> >> >
> >> > Thanks for the ddl. But i am particularly interested on the data in
> >> order
> >> > to reproduce the problem. Will it be feasible to share to data or if
> it
> >> is
> >> > huge then the scripts that generates it? I cannot access the data
> >> location
> >> > from public network.
> >> >
> >> >
> >> > On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 <
> >>
> >> > yixu2001@
> >>
> >> > > wrote:
> >> >
> >> >> dev
> >> >>
> >> >> Step 1:I make a hive table qqdata2.h_indextest1:
> >> >> CREATE EXTERNAL TABLE `qqdata2.h_indextest1`(
> >> >> `id` INT,
> >> >> `CUST_ORDER_ID` STRING,
> >> >> `ORDER_ITEM_IDATTR_ID` STRING,
> >> >> `ATTR_VALUE_IDATTR_VALUE` STRING,
> >> >> `CREATE_DATE` STRING,
> >> >> `UPDATE_DATE` STRING,
> >> >> `STATUS_CD` STRING,
> >> >> `STATUS_DATE` STRING,
> >> >> `AREA_ID` STRING,
> >> >> `REGION_CD` STRING,
> >> >> `UPDATE_STAFF` STRING,
> >> >> `CREATE_STAFF` STRING,
> >> >> `SHARDING_ID` STRING,
> >> >> `ORDER_ATTR_ID` STRING )
> >> >> ROW FORMAT DELIMITED
> >> >> FIELDS TERMINATED BY '\u0001'
> >> >> LINES TERMINATED BY '\n'
> >> >> NULL DEFINED AS ''
> >> >> STORED AS INPUTFORMAT
> >> >> 'org.apache.hadoop.mapred.TextInputFormat'
> >> >> OUTPUTFORMAT
> >> >> 'org.apache.hadoop.hive.ql.io
> >> >>
> >> >> .HiveIgnoreKeyTextOutputFormat'
> >> >> LOCATION
> >> >> 'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1
> >> >>
> >> >> ';
> >> >>
> >> >> there are csv files in hdfs dir user/bigdata/streamcql/dist1, the
> data
> >> >> format in the csv file just as following
> >> >>
> >> >> 1939505130,171483932,287305502,813463930,20160709134396669,
> >> >> 201607101469099594,1299,20160711996575390,10,73,302063,
> >> 302064,127859875,
> >> >> 9999999
> >> >>
> >> >> Step 2:I make a carbon table qqdata2.c_indextest1:
> >> >> cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id STRING,
> >> >> CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID
> >> STRING,ATTR_VALUE_IDATTR_VALUE
> >> >> STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD
> >> STRING,STATUS_DATE
> >> >> STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF
> STRING,CREATE_STAFF
> >> >> STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY
> 'carbondata'
> >> ")
> >> >>
> >> >> Step 3:Insert data:
> >> >> cc.sql("insert into qqdata2.c_indextest1 select * from
> >> >> qqdata2.h_indextest1").show(100,false);
> >> >>
> >> >> Step 4: Repeat from step1 to step 3, I make another carbon table
> >> >> qqdata2.c_indextest2
> >> >> The record number of qqdata2.c_indextest1 is 30w, the record number
> of
> >> >> qqdata2.c_indextest2 is 1700w.
> >> >>
> >> >>
> >> >>
> >> >> yixu2001
> >> >>
> >> >> From: sounak
> >> >> Date: 2017-10-19 14:13
> >> >> To: dev
> >> >> Subject: Re: Update statement failed with "Multiple input rows
> matched
> >> >> for
> >> >> same row" in version 1.2.0,
> >> >> Hi Yixu,
> >> >>
> >> >> Can you please share the DDLs and the Data for the above problem with
> >> us?
> >> >>
> >> >> Thanks
> >> >> Sounak
> >> >>
> >> >> On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <
> >>
> >> > yixu2001@
> >>
> >> > > wrote:
> >> >>
> >> >> > dev
> >> >> >
> >> >> >
> >> >> > In carbondata version 1.2.0, I execute "update" statement with
> >> >> sub-query,
> >> >> > it failed.
> >> >> > All the rows in the 2 tables are not duplicated, and the same
> >> statement
> >> >> > will succeed in carbondata version 1.1.1.
> >> >> >
> >> >> > The test log as following:
> >> >> > scala> cc.sql("select count(*), count(distinct id) from
> >> >> > qqdata2.c_indextest1").show(100,false);
> >> >> > +--------+------------------+
> >> >> > |count(1)|count(DISTINCT id)|
> >> >> > +--------+------------------+
> >> >> > |300000 |300000 |
> >> >> > +--------+------------------+
> >> >> >
> >> >> > scala> cc.sql("select count(*), count(distinct id) from
> >> >> > qqdata2.c_indextest2").show(100,false);
> >> >> > +--------+------------------+
> >> >> > |count(1)|count(DISTINCT id)|
> >> >> > +--------+------------------+
> >> >> > |71223220|71223220 |
> >> >> > +--------+------------------+
> >> >> >
> >> >> > scala> cc.sql("update qqdata2.c_indextest2 a
> >> >> set(a.CUST_ORDER_ID,a.ORDER_
> >> >> > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a.
> >> >> > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a.
> >> >> > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDE
> >> R_ATTR_ID)
> >> >> =
> >> >> > (select b.CUST_ORDER_ID,b.ORDER_ITEM_
> IDATTR_ID,b.ATTR_VALUE_IDATTR_
> >> >> > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_
> >> >> > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.
> >> >> SHARDING_ID,b.ORDER_ATTR_ID
> >> >> > from qqdata2.c_indextest1 b where a.id = b.id)").show(100,false);
> >> >> > 17/10/18 11:32:46 WARN Utils: Truncated the string representation
> of
> >> a
> >> >> > plan since it was too large. This behavior can be adjusted by
> setting
> >> >> > 'spark.debug.maxToStringFields' in SparkEnv.conf.
> >> >> > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn
> >> >> >
> >> >> > ][bigdata][Thread-1]Delete data operation is failed for
> >> >> > qqdata2.c_indextest2
> >> >> > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data
> operation
> >> is
> >> >> > failed due to failure in creating delete delta file for segment :
> >> null
> >> >> > block : null
> >> >> > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main Exception in
> >> >> update
> >> >> > operationjava.lang.Exception: Multiple input rows matched for same
> >> row.
> >> >> > java.lang.RuntimeException: Update operation failed. Multiple input
> >> >> rows
> >> >> > matched for same row.
> >> >> > at scala.sys.package$.error(package.scala:27)
> >> >> > at org.apache.spark.sql.execution.command.
> ProjectForUpdateCommand.
> >> >> > processData(IUDCommands.scala:239)
> >> >> > at
> >> >> org.apache.spark.sql.execution.command.ProjectForUpdateCommand.run(
> >> >> > IUDCommands.scala:141)
> >> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> >> >> > sideEffectResult$lzycompute(commands.scala:58)
> >> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> >> >> > sideEffectResult(commands.scala:56)
> >> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> >> >> > executeTake(commands.scala:71)
> >> >> > at org.apache.spark.sql.execution.CollectLimitExec.
> >> >> > executeCollect(limit.scala:38)
> >> >> > at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$
> >> >> > Dataset$$execute$1$1.apply(Dataset.scala:2378)
> >> >> > at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
> >> nId(
> >> >> > SQLExecution.scala:57)
> >> >> > at
> >> >> org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780)
> >> >> > at org.apache.spark.sql.Dataset.org
> >> >> >
> >> >> > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
> >> >> > at org.apache.spark.sql.Dataset.org
> >> >> >
> >> >> > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384)
> >> >> > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
> >> >> > scala:2120)
> >> >> > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
> >> >> > scala:2119)
> >> >> > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala
> >> :2810)
> >> >> > at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
> >> >> > at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
> >> >> > at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
> >> >> > at org.apache.spark.sql.Dataset.show(Dataset.scala:640)
> >> >> > ... 50 elided
> >> >> >
> >> >> >
> >> >> >
> >> >> > yixu2001
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Thanks
> >> >> Sounak
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks
> >> > Sounak
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5
> >> .nabble.com/
> >>
> >
> >
> >
> > --
> > Thanks
> > Sounak
> >
>
>
>
> --
> Thanks
> Sounak
>

--
Thanks
Sounak

yixu2001

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

dev
Baidu SkyDrive share it ?

yixu2001

From: sounak
Date: 2017-10-20 16:14
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
As the tupleIds are also unique, the only option left is to get hold of
your data and reproducing it in-house. Is it possible to generate your data
in-house or get it?

On Fri, Oct 20, 2017 at 12:14 PM, yixu2001 <[hidden email]> wrote:

--
Thanks
Sounak

sraghunandan

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

Should be fine.do we need to create any login to access it?
On Fri, 20 Oct 2017 at 1:31 PM, yixu2001 <[hidden email]> wrote:

> dev
> Baidu SkyDrive share it ?
>
>
> yixu2001
>
> From: sounak
> Date: 2017-10-20 16:14
> To: dev
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> As the tupleIds are also unique, the only option left is to get hold of
> your data and reproducing it in-house. Is it possible to generate your data
> in-house or get it?
>
> On Fri, Oct 20, 2017 at 12:14 PM, yixu2001 <[hidden email]> wrote:
>
> > dev
> >
> > scala> cc.sql(" select c.tupleId,count(*) from (select a.remark,a.id,
> > getTupleId() as tupleId from c_indextest1 a,c_indextest2 b where a.id=
> b.id)
> > c group by c.tupleId having count(*)>1").show;
> > +-------+--------+
> > |tupleId|count(1)|
> > +-------+--------+
> > +-------+--------+
> >
> > no Multiple input rows
> >
> >
> > yixu2001
> >
> > From: sounak
> > Date: 2017-10-20 10:57
> > To: dev
> > Subject: Re: Re: Update statement failed with "Multiple input rows
> matched
> > for same row" in version 1.2.0,
> > Slight rectification in typo of the query syntax
> >
> > " select a.remark,a.id, getTupleId() as *tupleId* from c_indextest1 a,
> > c_indextest2
> > b where a.id=b.id;"
> >
> > On Fri, Oct 20, 2017 at 7:54 AM, sounak <[hidden email]> wrote:
> >
> > > Internally, we call a UDF to generate the TupleId and based on that
> > > tupleId we decide if the row is duplicated or not. You can run a
> slightly
> > > tweaked query
> > >
> > > " select a.remark,a.id, getTupleId() as TupleId from c_indextest1 a,
> > c_indextest2
> > > b where a.id=b.id; "
> > >
> > > In order to prompt Multiple Input Rows most probably the getTupleId()
> > will
> > > give duplicate entries.
> > >
> > > In between your given query runs fine with the data i have generated in
> > > house. So this is data dependent. So it will be good if we get
> > > *origin_data* or the script which has build the data for origin_data.
> > >
> > > On Fri, Oct 20, 2017 at 7:28 AM, yixu2001 <[hidden email]> wrote:
> > >
> > >> dev
> > >> one row record
> > >>
> > >>
> > >> yixu2001
> > >>
> > >> From: Liang Chen
> > >> Date: 2017-10-19 22:26
> > >> To: dev
> > >> Subject: Re: Re: Update statement failed with "Multiple input rows
> > >> matched for same row" in version 1.2.0,
> > >> Hi
> > >>
> > >> Execute the below query, return one row record or multiple row
> records ?
> > >> ---------------------------------
> > >> select a.remark from c_indextest1 a where a.id=b.id
> > >>
> > >> Regards
> > >> Liang
> > >>
> > >>
> > >> yixu2001 wrote
> > >> > dev
> > >> > You can follow the steps below to reproduce the problem.
> > >> > tables c_indextest2 has 1700w records and table c_indextest1 has
> about
> > >> 30w
> > >> > records.
> > >> >
> > >> > step 1:
> > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark
> > >> string)
> > >> > STORED BY 'carbondata'").show;
> > >> >
> > >> > step 2: origin_data is a existing table with 1700w records, and
> fields
> > >> in
> > >> > the table do not matter.
> > >> > cc.sql("insert into c_indextest2 select row_number() over(partition
> by
> > >> > a.PKid order by a.pkid) id,a.remark from (SELECT '1' PKID,'dfsdd'
> > >> remark
> > >> > from origin_data limit 17000000) a").show;
> > >> >
> > >> > step 3:
> > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark
> > >> string)
> > >> > STORED BY 'carbondata'").show;
> > >> >
> > >> > step 4:
> > >> > cc.sql("insert into c_indextest1 select * from c_indextest2 where
> > >> > pmod(cast(ID as int),50)=43").show;
> > >> >
> > >> > setp 5:
> > >> > cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from
> > >> > c_indextest1 a where a.id=b.id)").show;
> > >> >
> > >> >
> > >> > yixu2001
> > >> >
> > >> > From: sounak
> > >> > Date: 2017-10-19 18:26
> > >> > To: dev
> > >> > Subject: Re: Re: Update statement failed with "Multiple input rows
> > >> matched
> > >> > for same row" in version 1.2.0,
> > >> > Hi Yixu,
> > >> >
> > >> > Thanks for the ddl. But i am particularly interested on the data in
> > >> order
> > >> > to reproduce the problem. Will it be feasible to share to data or if
> > it
> > >> is
> > >> > huge then the scripts that generates it? I cannot access the data
> > >> location
> > >> > from public network.
> > >> >
> > >> >
> > >> > On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 <
> > >>
> > >> > yixu2001@
> > >>
> > >> > > wrote:
> > >> >
> > >> >> dev
> > >> >>
> > >> >> Step 1:I make a hive table qqdata2.h_indextest1:
> > >> >> CREATE EXTERNAL TABLE `qqdata2.h_indextest1`(
> > >> >> `id` INT,
> > >> >> `CUST_ORDER_ID` STRING,
> > >> >> `ORDER_ITEM_IDATTR_ID` STRING,
> > >> >> `ATTR_VALUE_IDATTR_VALUE` STRING,
> > >> >> `CREATE_DATE` STRING,
> > >> >> `UPDATE_DATE` STRING,
> > >> >> `STATUS_CD` STRING,
> > >> >> `STATUS_DATE` STRING,
> > >> >> `AREA_ID` STRING,
> > >> >> `REGION_CD` STRING,
> > >> >> `UPDATE_STAFF` STRING,
> > >> >> `CREATE_STAFF` STRING,
> > >> >> `SHARDING_ID` STRING,
> > >> >> `ORDER_ATTR_ID` STRING )
> > >> >> ROW FORMAT DELIMITED
> > >> >> FIELDS TERMINATED BY '\u0001'
> > >> >> LINES TERMINATED BY '\n'
> > >> >> NULL DEFINED AS ''
> > >> >> STORED AS INPUTFORMAT
> > >> >> 'org.apache.hadoop.mapred.TextInputFormat'
> > >> >> OUTPUTFORMAT
> > >> >> 'org.apache.hadoop.hive.ql.io
> > >> >>
> > >> >> .HiveIgnoreKeyTextOutputFormat'
> > >> >> LOCATION
> > >> >> 'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1
> > >> >>
> > >> >> ';
> > >> >>
> > >> >> there are csv files in hdfs dir user/bigdata/streamcql/dist1, the
> > data
> > >> >> format in the csv file just as following
> > >> >>
> > >> >> 1939505130,171483932,287305502,813463930,20160709134396669,
> > >> >> 201607101469099594,1299,20160711996575390,10,73,302063,
> > >> 302064,127859875,
> > >> >> 9999999
> > >> >>
> > >> >> Step 2:I make a carbon table qqdata2.c_indextest1:
> > >> >> cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id STRING,
> > >> >> CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID
> > >> STRING,ATTR_VALUE_IDATTR_VALUE
> > >> >> STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD
> > >> STRING,STATUS_DATE
> > >> >> STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF
> > STRING,CREATE_STAFF
> > >> >> STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY
> > 'carbondata'
> > >> ")
> > >> >>
> > >> >> Step 3:Insert data:
> > >> >> cc.sql("insert into qqdata2.c_indextest1 select * from
> > >> >> qqdata2.h_indextest1").show(100,false);
> > >> >>
> > >> >> Step 4: Repeat from step1 to step 3, I make another carbon table
> > >> >> qqdata2.c_indextest2
> > >> >> The record number of qqdata2.c_indextest1 is 30w, the record number
> > of
> > >> >> qqdata2.c_indextest2 is 1700w.
> > >> >>
> > >> >>
> > >> >>
> > >> >> yixu2001
> > >> >>
> > >> >> From: sounak
> > >> >> Date: 2017-10-19 14:13
> > >> >> To: dev
> > >> >> Subject: Re: Update statement failed with "Multiple input rows
> > matched
> > >> >> for
> > >> >> same row" in version 1.2.0,
> > >> >> Hi Yixu,
> > >> >>
> > >> >> Can you please share the DDLs and the Data for the above problem
> with
> > >> us?
> > >> >>
> > >> >> Thanks
> > >> >> Sounak
> > >> >>
> > >> >> On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <
> > >>
> > >> > yixu2001@
> > >>
> > >> > > wrote:
> > >> >>
> > >> >> > dev
> > >> >> >
> > >> >> >
> > >> >> > In carbondata version 1.2.0, I execute "update" statement with
> > >> >> sub-query,
> > >> >> > it failed.
> > >> >> > All the rows in the 2 tables are not duplicated, and the same
> > >> statement
> > >> >> > will succeed in carbondata version 1.1.1.
> > >> >> >
> > >> >> > The test log as following:
> > >> >> > scala> cc.sql("select count(*), count(distinct id) from
> > >> >> > qqdata2.c_indextest1").show(100,false);
> > >> >> > +--------+------------------+
> > >> >> > |count(1)|count(DISTINCT id)|
> > >> >> > +--------+------------------+
> > >> >> > |300000 |300000 |
> > >> >> > +--------+------------------+
> > >> >> >
> > >> >> > scala> cc.sql("select count(*), count(distinct id) from
> > >> >> > qqdata2.c_indextest2").show(100,false);
> > >> >> > +--------+------------------+
> > >> >> > |count(1)|count(DISTINCT id)|
> > >> >> > +--------+------------------+
> > >> >> > |71223220|71223220 |
> > >> >> > +--------+------------------+
> > >> >> >
> > >> >> > scala> cc.sql("update qqdata2.c_indextest2 a
> > >> >> set(a.CUST_ORDER_ID,a.ORDER_
> > >> >> > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a.
> > >> >> > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a.
> > >> >> > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDE
> > >> R_ATTR_ID)
> > >> >> =
> > >> >> > (select b.CUST_ORDER_ID,b.ORDER_ITEM_
> > IDATTR_ID,b.ATTR_VALUE_IDATTR_
> > >> >> > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_
> > >> >> > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.
> > >> >> SHARDING_ID,b.ORDER_ATTR_ID
> > >> >> > from qqdata2.c_indextest1 b where a.id = b.id
> )").show(100,false);
> > >> >> > 17/10/18 11:32:46 WARN Utils: Truncated the string representation
> > of
> > >> a
> > >> >> > plan since it was too large. This behavior can be adjusted by
> > setting
> > >> >> > 'spark.debug.maxToStringFields' in SparkEnv.conf.
> > >> >> > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn
> > >> >> >
> > >> >> > ][bigdata][Thread-1]Delete data operation is failed for
> > >> >> > qqdata2.c_indextest2
> > >> >> > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data
> > operation
> > >> is
> > >> >> > failed due to failure in creating delete delta file for segment :
> > >> null
> > >> >> > block : null
> > >> >> > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main Exception
> in
> > >> >> update
> > >> >> > operationjava.lang.Exception: Multiple input rows matched for
> same
> > >> row.
> > >> >> > java.lang.RuntimeException: Update operation failed. Multiple
> input
> > >> >> rows
> > >> >> > matched for same row.
> > >> >> > at scala.sys.package$.error(package.scala:27)
> > >> >> > at org.apache.spark.sql.execution.command.
> > ProjectForUpdateCommand.
> > >> >> > processData(IUDCommands.scala:239)
> > >> >> > at
> > >> >> org.apache.spark.sql.execution.command.ProjectForUpdateCommand.run(
> > >> >> > IUDCommands.scala:141)
> > >> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> > >> >> > sideEffectResult$lzycompute(commands.scala:58)
> > >> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> > >> >> > sideEffectResult(commands.scala:56)
> > >> >> > at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> > >> >> > executeTake(commands.scala:71)
> > >> >> > at org.apache.spark.sql.execution.CollectLimitExec.
> > >> >> > executeCollect(limit.scala:38)
> > >> >> > at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$
> > >> >> > Dataset$$execute$1$1.apply(Dataset.scala:2378)
> > >> >> > at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
> > >> nId(
> > >> >> > SQLExecution.scala:57)
> > >> >> > at
> > >> >> org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780)
> > >> >> > at org.apache.spark.sql.Dataset.org
> > >> >> >
> > >> >> > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
> > >> >> > at org.apache.spark.sql.Dataset.org
> > >> >> >
> > >> >> > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384)
> > >> >> > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
> > >> >> > scala:2120)
> > >> >> > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
> > >> >> > scala:2119)
> > >> >> > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala
> > >> :2810)
> > >> >> > at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
> > >> >> > at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
> > >> >> > at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
> > >> >> > at org.apache.spark.sql.Dataset.show(Dataset.scala:640)
> > >> >> > ... 50 elided
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > yixu2001
> > >> >> >
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Thanks
> > >> >> Sounak
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Thanks
> > >> > Sounak
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5
> > >> .nabble.com/
> > >>
> > >
> > >
> > >
> > > --
> > > Thanks
> > > Sounak
> > >
> >
> >
> >
> > --
> > Thanks
> > Sounak
> >
>
>
>
> --
> Thanks
> Sounak
>

yixu2001

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

dev

testdata
链接：http://pan.baidu.com/s/1boVAqeF 密码：qtdh
CSV

yixu2001

From: Raghunandan S
Date: 2017-10-20 16:56
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
Should be fine.do we need to create any login to access it?
On Fri, 20 Oct 2017 at 1:31 PM, yixu2001 <[hidden email]> wrote:

yixu2001

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

dev

thank you
We hope it will be resolved in a new release version carbondata 1.2.1.
In addition, the problem about maillist "Is query slowing down due to the fragmentations produced by the many update operations?" is also a very important problem in our selection process. We will be grateful if it can also be resolved in carbondata 1.2.1.

yixu2001

From: yixu2001
Date: 2017-10-21 11:12
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
dev

testdata
链接：http://pan.baidu.com/s/1boVAqeF 密码：qtdh
CSV

yixu2001
From: Raghunandan S
Date: 2017-10-20 16:56
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
Should be fine.do we need to create any login to access it?
On Fri, 20 Oct 2017 at 1:31 PM, yixu2001 <[hidden email]> wrote:

yixu2001

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

dev
hello there is no solution, the next version carbondata 1.2.1. can be released?

yixu2001

From: yixu2001
Date: 2017-10-27 16:55
To: dev
CC: sounak; chenliang6136
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
dev

thank you
We hope it will be resolved in a new release version carbondata 1.2.1.
In addition, the problem about maillist "Is query slowing down due to the fragmentations produced by the many update operations?" is also a very important problem in our selection process. We will be grateful if it can also be resolved in carbondata 1.2.1.

yixu2001

From: yixu2001
Date: 2017-10-21 11:12
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
dev

testdata
链接：http://pan.baidu.com/s/1boVAqeF 密码：qtdh
CSV

yixu2001
From: Raghunandan S
Date: 2017-10-20 16:56
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
Should be fine.do we need to create any login to access it?
On Fri, 20 Oct 2017 at 1:31 PM, yixu2001 <[hidden email]> wrote:

yixu2001

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

dev
What time is expected to release a patch, we can test

yixu2001

From: yixu2001
Date: 2017-11-07 15:47
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
dev
hello there is no solution, the next version carbondata 1.2.1. can be released?

yixu2001

From: yixu2001
Date: 2017-10-27 16:55
To: dev
CC: sounak; chenliang6136
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
dev

thank you
We hope it will be resolved in a new release version carbondata 1.2.1.
In addition, the problem about maillist "Is query slowing down due to the fragmentations produced by the many update operations?" is also a very important problem in our selection process. We will be grateful if it can also be resolved in carbondata 1.2.1.

yixu2001

From: yixu2001
Date: 2017-10-21 11:12
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
dev

testdata
链接：http://pan.baidu.com/s/1boVAqeF 密码：qtdh
CSV

yixu2001
From: Raghunandan S
Date: 2017-10-20 16:56
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
Should be fine.do we need to create any login to access it?
On Fri, 20 Oct 2017 at 1:31 PM, yixu2001 <[hidden email]> wrote:

sounak

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

Hi,

This fix is still under development. Hopefully we will be able to give a
proper solution by this week end.

On Tue, Nov 7, 2017 at 1:24 PM, yixu2001 <[hidden email]> wrote:

> dev
> What time is expected to release a patch, we can test
>
>
> yixu2001
>
> From: yixu2001
> Date: 2017-11-07 15:47
> To: dev
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> dev
> hello there is no solution, the next version carbondata 1.2.1. can be
> released?
>
>
> yixu2001
>
> From: yixu2001
> Date: 2017-10-27 16:55
> To: dev
> CC: sounak; chenliang6136
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> dev
>
> thank you
> We hope it will be resolved in a new release version carbondata 1.2.1.
> In addition, the problem about maillist "Is query slowing down due to the
> fragmentations produced by the many update operations?" is also a very
> important problem in our selection process. We will be grateful if it can
> also be resolved in carbondata 1.2.1.
>
>
> yixu2001
>
> From: yixu2001
> Date: 2017-10-21 11:12
> To: dev
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> dev
>
> testdata
> 链接：http://pan.baidu.com/s/1boVAqeF 密码：qtdh
> CSV
>
>
> yixu2001
> From: Raghunandan S
> Date: 2017-10-20 16:56
> To: dev
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> Should be fine.do we need to create any login to access it?
> On Fri, 20 Oct 2017 at 1:31 PM, yixu2001 <[hidden email]> wrote:
> > dev
> > Baidu SkyDrive share it ?
> >
> >
> > yixu2001
> >
> > From: sounak
> > Date: 2017-10-20 16:14
> > To: dev
> > Subject: Re: Re: Update statement failed with "Multiple input rows
> matched
> > for same row" in version 1.2.0,
> > As the tupleIds are also unique, the only option left is to get hold of
> > your data and reproducing it in-house. Is it possible to generate your
> data
> > in-house or get it?
> >
> > On Fri, Oct 20, 2017 at 12:14 PM, yixu2001 <[hidden email]> wrote:
> >
> > > dev
> > >
> > > scala> cc.sql(" select c.tupleId,count(*) from (select a.remark,a.id
> ,
> > > getTupleId() as tupleId from c_indextest1 a,c_indextest2 b where a.id=
> > b.id)
> > > c group by c.tupleId having count(*)>1").show;
> > > +-------+--------+
> > > |tupleId|count(1)|
> > > +-------+--------+
> > > +-------+--------+
> > >
> > > no Multiple input rows
> > >
> > >
> > > yixu2001
> > >
> > > From: sounak
> > > Date: 2017-10-20 10:57
> > > To: dev
> > > Subject: Re: Re: Update statement failed with "Multiple input rows
> > matched
> > > for same row" in version 1.2.0,
> > > Slight rectification in typo of the query syntax
> > >
> > > " select a.remark,a.id, getTupleId() as *tupleId* from c_indextest1 a,
> > > c_indextest2
> > > b where a.id=b.id;"
> > >
> > > On Fri, Oct 20, 2017 at 7:54 AM, sounak <[hidden email]> wrote:
> > >
> > > > Internally, we call a UDF to generate the TupleId and based on that
> > > > tupleId we decide if the row is duplicated or not. You can run a
> > slightly
> > > > tweaked query
> > > >
> > > > " select a.remark,a.id, getTupleId() as TupleId from c_indextest1 a,
> > > c_indextest2
> > > > b where a.id=b.id; "
> > > >
> > > > In order to prompt Multiple Input Rows most probably the getTupleId()
> > > will
> > > > give duplicate entries.
> > > >
> > > > In between your given query runs fine with the data i have generated
> in
> > > > house. So this is data dependent. So it will be good if we get
> > > > *origin_data* or the script which has build the data for origin_data.
> > > >
> > > > On Fri, Oct 20, 2017 at 7:28 AM, yixu2001 <[hidden email]> wrote:
> > > >
> > > >> dev
> > > >> one row record
> > > >>
> > > >>
> > > >> yixu2001
> > > >>
> > > >> From: Liang Chen
> > > >> Date: 2017-10-19 22:26
> > > >> To: dev
> > > >> Subject: Re: Re: Update statement failed with "Multiple input rows
> > > >> matched for same row" in version 1.2.0,
> > > >> Hi
> > > >>
> > > >> Execute the below query, return one row record or multiple row
> > records ?
> > > >> ---------------------------------
> > > >> select a.remark from c_indextest1 a where a.id=b.id
> > > >>
> > > >> Regards
> > > >> Liang
> > > >>
> > > >>
> > > >> yixu2001 wrote
> > > >> > dev
> > > >> > You can follow the steps below to reproduce the problem.
> > > >> > tables c_indextest2 has 1700w records and table c_indextest1 has
> > about
> > > >> 30w
> > > >> > records.
> > > >> >
> > > >> > step 1:
> > > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark
> > > >> string)
> > > >> > STORED BY 'carbondata'").show;
> > > >> >
> > > >> > step 2: origin_data is a existing table with 1700w records, and
> > fields
> > > >> in
> > > >> > the table do not matter.
> > > >> > cc.sql("insert into c_indextest2 select row_number()
> over(partition
> > by
> > > >> > a.PKid order by a.pkid) id,a.remark from (SELECT '1' PKID,'dfsdd'
> > > >> remark
> > > >> > from origin_data limit 17000000) a").show;
> > > >> >
> > > >> > step 3:
> > > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark
> > > >> string)
> > > >> > STORED BY 'carbondata'").show;
> > > >> >
> > > >> > step 4:
> > > >> > cc.sql("insert into c_indextest1 select * from c_indextest2 where
> > > >> > pmod(cast(ID as int),50)=43").show;
> > > >> >
> > > >> > setp 5:
> > > >> > cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from
> > > >> > c_indextest1 a where a.id=b.id)").show;
> > > >> >
> > > >> >
> > > >> > yixu2001
> > > >> >
> > > >> > From: sounak
> > > >> > Date: 2017-10-19 18:26
> > > >> > To: dev
> > > >> > Subject: Re: Re: Update statement failed with "Multiple input rows
> > > >> matched
> > > >> > for same row" in version 1.2.0,
> > > >> > Hi Yixu,
> > > >> >
> > > >> > Thanks for the ddl. But i am particularly interested on the data
> in
> > > >> order
> > > >> > to reproduce the problem. Will it be feasible to share to data or
> if
> > > it
> > > >> is
> > > >> > huge then the scripts that generates it? I cannot access the data
> > > >> location
> > > >> > from public network.
> > > >> >
> > > >> >
> > > >> > On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 <
> > > >>
> > > >> > yixu2001@
> > > >>
> > > >> > > wrote:
> > > >> >
> > > >> >> dev
> > > >> >>
> > > >> >> Step 1:I make a hive table qqdata2.h_indextest1:
> > > >> >> CREATE EXTERNAL TABLE `qqdata2.h_indextest1`(
> > > >> >> `id` INT,
> > > >> >> `CUST_ORDER_ID` STRING,
> > > >> >> `ORDER_ITEM_IDATTR_ID` STRING,
> > > >> >> `ATTR_VALUE_IDATTR_VALUE` STRING,
> > > >> >> `CREATE_DATE` STRING,
> > > >> >> `UPDATE_DATE` STRING,
> > > >> >> `STATUS_CD` STRING,
> > > >> >> `STATUS_DATE` STRING,
> > > >> >> `AREA_ID` STRING,
> > > >> >> `REGION_CD` STRING,
> > > >> >> `UPDATE_STAFF` STRING,
> > > >> >> `CREATE_STAFF` STRING,
> > > >> >> `SHARDING_ID` STRING,
> > > >> >> `ORDER_ATTR_ID` STRING )
> > > >> >> ROW FORMAT DELIMITED
> > > >> >> FIELDS TERMINATED BY '\u0001'
> > > >> >> LINES TERMINATED BY '\n'
> > > >> >> NULL DEFINED AS ''
> > > >> >> STORED AS INPUTFORMAT
> > > >> >> 'org.apache.hadoop.mapred.TextInputFormat'
> > > >> >> OUTPUTFORMAT
> > > >> >> 'org.apache.hadoop.hive.ql.io
> > > >> >>
> > > >> >> .HiveIgnoreKeyTextOutputFormat'
> > > >> >> LOCATION
> > > >> >> 'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1
> > > >> >>
> > > >> >> ';
> > > >> >>
> > > >> >> there are csv files in hdfs dir user/bigdata/streamcql/dist1, the
> > > data
> > > >> >> format in the csv file just as following
> > > >> >>
> > > >> >> 1939505130,171483932,287305502,813463930,20160709134396669,
> > > >> >> 201607101469099594,1299,20160711996575390,10,73,302063,
> > > >> 302064,127859875,
> > > >> >> 9999999
> > > >> >>
> > > >> >> Step 2:I make a carbon table qqdata2.c_indextest1:
> > > >> >> cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id
> STRING,
> > > >> >> CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID
> > > >> STRING,ATTR_VALUE_IDATTR_VALUE
> > > >> >> STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD
> > > >> STRING,STATUS_DATE
> > > >> >> STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF
> > > STRING,CREATE_STAFF
> > > >> >> STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY
> > > 'carbondata'
> > > >> ")
> > > >> >>
> > > >> >> Step 3:Insert data:
> > > >> >> cc.sql("insert into qqdata2.c_indextest1 select * from
> > > >> >> qqdata2.h_indextest1").show(100,false);
> > > >> >>
> > > >> >> Step 4: Repeat from step1 to step 3, I make another carbon table
> > > >> >> qqdata2.c_indextest2
> > > >> >> The record number of qqdata2.c_indextest1 is 30w, the record
> number
> > > of
> > > >> >> qqdata2.c_indextest2 is 1700w.
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> yixu2001
> > > >> >>
> > > >> >> From: sounak
> > > >> >> Date: 2017-10-19 14:13
> > > >> >> To: dev
> > > >> >> Subject: Re: Update statement failed with "Multiple input rows
> > > matched
> > > >> >> for
> > > >> >> same row" in version 1.2.0,
> > > >> >> Hi Yixu,
> > > >> >>
> > > >> >> Can you please share the DDLs and the Data for the above problem
> > with
> > > >> us?
> > > >> >>
> > > >> >> Thanks
> > > >> >> Sounak
> > > >> >>
> > > >> >> On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <
> > > >>
> > > >> > yixu2001@
> > > >>
> > > >> > > wrote:
> > > >> >>
> > > >> >> > dev
> > > >> >> >
> > > >> >> >
> > > >> >> > In carbondata version 1.2.0, I execute "update" statement with
> > > >> >> sub-query,
> > > >> >> > it failed.
> > > >> >> > All the rows in the 2 tables are not duplicated, and the same
> > > >> statement
> > > >> >> > will succeed in carbondata version 1.1.1.
> > > >> >> >
> > > >> >> > The test log as following:
> > > >> >> > scala> cc.sql("select count(*), count(distinct id) from
> > > >> >> > qqdata2.c_indextest1").show(100,false);
> > > >> >> > +--------+------------------+
> > > >> >> > |count(1)|count(DISTINCT id)|
> > > >> >> > +--------+------------------+
> > > >> >> > |300000 |300000 |
> > > >> >> > +--------+------------------+
> > > >> >> >
> > > >> >> > scala> cc.sql("select count(*), count(distinct id) from
> > > >> >> > qqdata2.c_indextest2").show(100,false);
> > > >> >> > +--------+------------------+
> > > >> >> > |count(1)|count(DISTINCT id)|
> > > >> >> > +--------+------------------+
> > > >> >> > |71223220|71223220 |
> > > >> >> > +--------+------------------+
> > > >> >> >
> > > >> >> > scala> cc.sql("update qqdata2.c_indextest2 a
> > > >> >> set(a.CUST_ORDER_ID,a.ORDER_
> > > >> >> > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a.
> > > >> >> > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a.
> > > >> >> > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDE
> > > >> R_ATTR_ID)
> > > >> >> =
> > > >> >> > (select b.CUST_ORDER_ID,b.ORDER_ITEM_
> > > IDATTR_ID,b.ATTR_VALUE_IDATTR_
> > > >> >> > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_
> > > >> >> > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.
> > > >> >> SHARDING_ID,b.ORDER_ATTR_ID
> > > >> >> > from qqdata2.c_indextest1 b where a.id = b.id
> > )").show(100,false);
> > > >> >> > 17/10/18 11:32:46 WARN Utils: Truncated the string
> representation
> > > of
> > > >> a
> > > >> >> > plan since it was too large. This behavior can be adjusted by
> > > setting
> > > >> >> > 'spark.debug.maxToStringFields' in SparkEnv.conf.
> > > >> >> > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn
> > > >> >> >
> > > >> >> > ][bigdata][Thread-1]Delete data operation is failed for
> > > >> >> > qqdata2.c_indextest2
> > > >> >> > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data
> > > operation
> > > >> is
> > > >> >> > failed due to failure in creating delete delta file for
> segment :
> > > >> null
> > > >> >> > block : null
> > > >> >> > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main
> Exception
> > in
> > > >> >> update
> > > >> >> > operationjava.lang.Exception: Multiple input rows matched for
> > same
> > > >> row.
> > > >> >> > java.lang.RuntimeException: Update operation failed. Multiple
> > input
> > > >> >> rows
> > > >> >> > matched for same row.
> > > >> >> > at scala.sys.package$.error(package.scala:27)
> > > >> >> > at org.apache.spark.sql.execution.command.
> > > ProjectForUpdateCommand.
> > > >> >> > processData(IUDCommands.scala:239)
> > > >> >> > at
> > > >> >> org.apache.spark.sql.execution.command.
> ProjectForUpdateCommand.run(
> > > >> >> > IUDCommands.scala:141)
> > > >> >> > at org.apache.spark.sql.execution.command.
> ExecutedCommandExec.
> > > >> >> > sideEffectResult$lzycompute(commands.scala:58)
> > > >> >> > at org.apache.spark.sql.execution.command.
> ExecutedCommandExec.
> > > >> >> > sideEffectResult(commands.scala:56)
> > > >> >> > at org.apache.spark.sql.execution.command.
> ExecutedCommandExec.
> > > >> >> > executeTake(commands.scala:71)
> > > >> >> > at org.apache.spark.sql.execution.CollectLimitExec.
> > > >> >> > executeCollect(limit.scala:38)
> > > >> >> > at org.apache.spark.sql.Dataset$$
> anonfun$org$apache$spark$sql$
> > > >> >> > Dataset$$execute$1$1.apply(Dataset.scala:2378)
> > > >> >> > at org.apache.spark.sql.execution.SQLExecution$.
> withNewExecutio
> > > >> nId(
> > > >> >> > SQLExecution.scala:57)
> > > >> >> > at
> > > >> >> org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.
> scala:2780)
> > > >> >> > at org.apache.spark.sql.Dataset.org
> > > >> >> >
> > > >> >> > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
> > > >> >> > at org.apache.spark.sql.Dataset.org
> > > >> >> >
> > > >> >> > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384)
> > > >> >> > at org.apache.spark.sql.Dataset$$
> anonfun$head$1.apply(Dataset.
> > > >> >> > scala:2120)
> > > >> >> > at org.apache.spark.sql.Dataset$$
> anonfun$head$1.apply(Dataset.
> > > >> >> > scala:2119)
> > > >> >> > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.
> scala
> > > >> :2810)
> > > >> >> > at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
> > > >> >> > at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
> > > >> >> > at org.apache.spark.sql.Dataset.
> showString(Dataset.scala:248)
> > > >> >> > at org.apache.spark.sql.Dataset.show(Dataset.scala:640)
> > > >> >> > ... 50 elided
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > yixu2001
> > > >> >> >
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >> Thanks
> > > >> >> Sounak
> > > >> >>
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Thanks
> > > >> > Sounak
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5
> > > >> .nabble.com/
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks
> > > > Sounak
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks
> > > Sounak
> > >
> >
> >
> >
> > --
> > Thanks
> > Sounak
> >
>

--
Thanks
Sounak