GitHub user anubhav100 opened a pull request:
https://github.com/apache/carbondata/pull/1660 [CARBONDATA-1731] [BugFix] Update fails incorrectly with error for table created in external db You can merge this pull request into a Git repository by running: $ git pull https://github.com/anubhav100/incubator-carbondata BugFix/CARBONDATA-1731 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1660.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1660 ---- commit 50be5a3d061c4319f524956e1694e9a241a25286 Author: anubhav100 <[hidden email]> Date: 2017-12-14T11:31:32Z Update fails incorrectly with error for table created in external db location ---- --- |
Github user mohammadshahidkhan commented on the issue:
https://github.com/apache/carbondata/pull/1660 @anubhav100 Thanks for working on this issue Please mention the route cause. Is it not failing with non external database? Please add test case for the same scenario. --- |
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:
https://github.com/apache/carbondata/pull/1660 @mohammadshahidkhan 1.it is failing for both external and non external database i used the description as provided in jira i am updating it now 2.i think it is not possible to add test case because it is failing for very huge data,it is failing for tpch orders table with 1gb data --- |
In reply to this post by qiuchenjian-2
Github user mohammadshahidkhan commented on the issue:
https://github.com/apache/carbondata/pull/1660 LGTM +1 --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1660 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/739/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1660 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1968/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1660 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2286/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1660 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2288/ --- |
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:
https://github.com/apache/carbondata/pull/1660 @mohammadshahidkhan if it looks good please merge it --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/1660 @anubhav100 please check this pr whether fix jira 1731 and 1728 both ? and please update further description, why for large data, the issue happen. --- |
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:
https://github.com/apache/carbondata/pull/1660 @chenliang613 i have updated the description you can check,yes pr has resolved both jira 1731,1728 --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1660#discussion_r157148988 --- Diff: core/src/main/java/org/apache/carbondata/core/mutate/DeleteDeltaBlockDetails.java --- @@ -82,9 +80,21 @@ public boolean addBlockletDetails(DeleteDeltaBlockletDetails blocklet) { public boolean addBlocklet(String blockletId, String offset, Integer pageId) throws Exception { DeleteDeltaBlockletDetails blocklet = new DeleteDeltaBlockletDetails(blockletId, pageId); + int index = blockletDetails.indexOf(blocklet); + try { - blocklet.addDeletedRow(CarbonUpdateUtil.getIntegerValue(offset)); - return addBlockletDetails(blocklet); + boolean isRowAddedForDeletion = + blocklet.addDeletedRow(CarbonUpdateUtil.getIntegerValue(offset)); + if (isRowAddedForDeletion) { + if (blockletDetails.isEmpty() || index == -1) { + return blockletDetails.add(blocklet); + } else { + blockletDetails.get(index).addDeletedRows(blocklet.getDeletedRows()); + return true; --- End diff -- why here should add "return true" again ? blockletDetails.get(index).addDeletedRows(blocklet.getDeletedRows()) should already return true ? --- |
In reply to this post by qiuchenjian-2
Github user sounakr commented on the issue:
https://github.com/apache/carbondata/pull/1660 @anubhav100. The issue of jira 1728 and jira 1731 is different from the proposed code. The real intention of addBlockletDetails is to add deleted rows to Delete Delta file and also to report error in case same row is deleted twice. In case addBlockletDetails finds duplicate record and return false then will get error like "Multiple input rows matched for same row.". In case we get these error the problem is somewhere much before like splits might have choosen duplicate blocks etc. Therefore we need to debug the prblem why we should be getting duplicate rows at initial stage. So this fix is not required as it is not going to sove the jira-1728 and 1731 problems. The problems specified in those jira has to be debuged more to find the exact root cause. --- |
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1660#discussion_r157152251 --- Diff: core/src/main/java/org/apache/carbondata/core/mutate/DeleteDeltaBlockDetails.java --- @@ -82,9 +80,21 @@ public boolean addBlockletDetails(DeleteDeltaBlockletDetails blocklet) { public boolean addBlocklet(String blockletId, String offset, Integer pageId) throws Exception { DeleteDeltaBlockletDetails blocklet = new DeleteDeltaBlockletDetails(blockletId, pageId); + int index = blockletDetails.indexOf(blocklet); + try { - blocklet.addDeletedRow(CarbonUpdateUtil.getIntegerValue(offset)); - return addBlockletDetails(blocklet); + boolean isRowAddedForDeletion = + blocklet.addDeletedRow(CarbonUpdateUtil.getIntegerValue(offset)); + if (isRowAddedForDeletion) { + if (blockletDetails.isEmpty() || index == -1) { + return blockletDetails.add(blocklet); + } else { + blockletDetails.get(index).addDeletedRows(blocklet.getDeletedRows()); + return true; --- End diff -- @chenliang613 i am looking into the suggestion which is given by @sounakr that is splits might have choosen duplicate blocks --- |
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:
https://github.com/apache/carbondata/pull/1660 @sounakr when i run my pr with this solution both jiras doesn't get reproduced so i think it is the right way to not to add deleted rows again and again now i am looking for the solution as you provided --- |
In reply to this post by qiuchenjian-2
Github user sounakr commented on the issue:
https://github.com/apache/carbondata/pull/1660 @anubhav100 . Also it seems jira 1728 and jira 1731 might have different causes. 1728 doesn't even through "Multiple input rows matched for same row.". So for this case addBlockletDetails may not be returning false for duplicate rows. As all rows are deleted please check in table update Status file also as the segment should be MARKED FOR DELETE. --- |
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:
https://github.com/apache/carbondata/pull/1660 @sounakr no you are wrong for 1728 it is also returnning false chetan executed the query select count(*) if he executed select * it will show the same error you can check i already checked that please suggest? --- |
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:
https://github.com/apache/carbondata/pull/1660 @sounakr when i try to reproduce the CARBONDATA-1728 using this script spark.sql("DROP TABLE IF EXISTS ORDERS") spark.sql("DROP TABLE IF EXISTS H_ORDERS") spark.sql("create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY 'org.apache.carbondata.format'") spark.sql("load data inpath \"hdfs://localhost:54311/orders.csv\" into table ORDERS options('DELIMITER'='|','FILEHEADER'='O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT')") spark.sql(" create table h_orders as select * from orders").show() spark.sql("Delete from orders a where exists (select 1 from h_orders b where b.o_ORDERKEY=a.O_ORDERKEY)") spark.sql("SELECT O_COMMENT FROM ORDERS").show() here are errors 17/12/15 15:47:19 ERROR DeleteExecution$: Executor task launch worker-2 Multiple input rows matched for same row. 17/12/15 15:47:20 AUDIT DeleteExecution$: [anubhav-Vostro-3559][anubhav][Thread-1]Delete data operation is failed for default.orders 17/12/15 15:47:20 ERROR DeleteExecution$: main Delete data operation is failed due to failure in creating delete delta file for segment : null block : null --- |
In reply to this post by qiuchenjian-2
Github user sounakr commented on the issue:
https://github.com/apache/carbondata/pull/1660 @anubhav100 The above errors are from Delete not from Select query. Please debug jira 1728 and 1731. They might have similar or different root causes. --- |
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:
https://github.com/apache/carbondata/pull/1660 @sounakr both are failing for same cause that is Multiple input rows matched for same row. --- |
Free forum by Nabble | Edit this page |