[GitHub] carbondata pull request #1660: [CARBONDATA-1731] [BugFix] Update fails incor...

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1660: [CARBONDATA-1731] [BugFix] Update fails incor...

qiuchenjian-2
GitHub user anubhav100 opened a pull request:

    https://github.com/apache/carbondata/pull/1660

    [CARBONDATA-1731] [BugFix] Update fails incorrectly with error for table created in external db

   

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/anubhav100/incubator-carbondata BugFix/CARBONDATA-1731

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1660.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1660
   
----
commit 50be5a3d061c4319f524956e1694e9a241a25286
Author: anubhav100 <[hidden email]>
Date:   2017-12-14T11:31:32Z

    Update fails incorrectly with error for table created in external db location

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731] [BugFix] Update fails incorrectly ...

qiuchenjian-2
Github user mohammadshahidkhan commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    @anubhav100  Thanks for working on this issue
    Please mention the route cause.
    Is it not failing with non external database?
    Please add test case for the same scenario.
   



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731] [BugFix] Update fails incorrectly ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    @mohammadshahidkhan
    1.it is failing for both external and non external database i used the description as provided in jira i am updating it now
    2.i think it is not possible to add test case because it is failing for very huge data,it is failing for tpch orders table with 1gb data


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731] [BugFix] Update fails incorrectly ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user mohammadshahidkhan commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    LGTM +1


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731] [BugFix] Update fails incorrectly ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/739/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731] [BugFix] Update fails incorrectly ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1968/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731] [BugFix] Update fails incorrectly ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2286/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731] [BugFix] Update fails incorrectly ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2288/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731] [BugFix] Update fails incorrectly ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    @mohammadshahidkhan if it looks good please merge it


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731] [BugFix] Update fails incorrectly ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    @anubhav100  please check this pr whether fix jira 1731 and 1728 both ?    and please update further description, why for large data, the issue happen.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    @chenliang613 i have updated the description you can check,yes pr has resolved both jira 1731,1728


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Up...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1660#discussion_r157148988
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/mutate/DeleteDeltaBlockDetails.java ---
    @@ -82,9 +80,21 @@ public boolean addBlockletDetails(DeleteDeltaBlockletDetails blocklet) {
     
       public boolean addBlocklet(String blockletId, String offset, Integer pageId) throws Exception {
         DeleteDeltaBlockletDetails blocklet = new DeleteDeltaBlockletDetails(blockletId, pageId);
    +    int index = blockletDetails.indexOf(blocklet);
    +
         try {
    -      blocklet.addDeletedRow(CarbonUpdateUtil.getIntegerValue(offset));
    -      return addBlockletDetails(blocklet);
    +      boolean isRowAddedForDeletion =
    +          blocklet.addDeletedRow(CarbonUpdateUtil.getIntegerValue(offset));
    +      if (isRowAddedForDeletion) {
    +        if (blockletDetails.isEmpty() || index == -1) {
    +          return blockletDetails.add(blocklet);
    +        } else {
    +          blockletDetails.get(index).addDeletedRows(blocklet.getDeletedRows());
    +          return true;
    --- End diff --
   
    why here should add "return true" again ?  blockletDetails.get(index).addDeletedRows(blocklet.getDeletedRows()) should already return true ?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sounakr commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    @anubhav100. The issue of jira 1728 and jira 1731 is different from the proposed code. The real intention of addBlockletDetails is to add deleted rows to Delete Delta file and also to report error in case same row is deleted twice. In case addBlockletDetails finds duplicate record and return false then will get error like "Multiple input rows matched for same row.".
    In case we get these error the problem is somewhere much before like splits might have choosen duplicate blocks etc.
    Therefore we need to debug the prblem why we should be getting duplicate rows at initial stage.
    So this fix is not required as it is not going to sove the jira-1728 and 1731 problems.
    The problems specified in those jira has to be debuged more to find the exact root cause.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Up...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1660#discussion_r157152251
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/mutate/DeleteDeltaBlockDetails.java ---
    @@ -82,9 +80,21 @@ public boolean addBlockletDetails(DeleteDeltaBlockletDetails blocklet) {
     
       public boolean addBlocklet(String blockletId, String offset, Integer pageId) throws Exception {
         DeleteDeltaBlockletDetails blocklet = new DeleteDeltaBlockletDetails(blockletId, pageId);
    +    int index = blockletDetails.indexOf(blocklet);
    +
         try {
    -      blocklet.addDeletedRow(CarbonUpdateUtil.getIntegerValue(offset));
    -      return addBlockletDetails(blocklet);
    +      boolean isRowAddedForDeletion =
    +          blocklet.addDeletedRow(CarbonUpdateUtil.getIntegerValue(offset));
    +      if (isRowAddedForDeletion) {
    +        if (blockletDetails.isEmpty() || index == -1) {
    +          return blockletDetails.add(blocklet);
    +        } else {
    +          blockletDetails.get(index).addDeletedRows(blocklet.getDeletedRows());
    +          return true;
    --- End diff --
   
    @chenliang613 i am looking into the suggestion which is given by @sounakr that is splits might have choosen duplicate blocks


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    @sounakr when i run  my pr with this solution both jiras doesn't get reproduced so i think it is the right way to not to add deleted rows again and again now i am looking for the solution as you provided


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sounakr commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    @anubhav100 . Also it seems jira 1728 and jira 1731 might have different causes. 1728 doesn't even through "Multiple input rows matched for same row.". So for this case addBlockletDetails may not be returning false for duplicate rows. As all rows are deleted please check in table update Status file also as the segment should be MARKED FOR DELETE.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    @sounakr no you are wrong for 1728 it is also returnning false chetan executed the query select count(*) if he executed select * it will show the same error you can check i already checked that please suggest?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    @sounakr when i try  to reproduce the CARBONDATA-1728
   
    using this script
   
    spark.sql("DROP TABLE IF EXISTS ORDERS")
   
        spark.sql("DROP TABLE IF EXISTS H_ORDERS")
   
        spark.sql("create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY 'org.apache.carbondata.format'")
   
        spark.sql("load data inpath \"hdfs://localhost:54311/orders.csv\" into table ORDERS options('DELIMITER'='|','FILEHEADER'='O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT')")
   
        spark.sql(" create table h_orders as select * from orders").show()
   
        spark.sql("Delete from orders a where exists (select 1 from h_orders b where b.o_ORDERKEY=a.O_ORDERKEY)")
   
    spark.sql("SELECT O_COMMENT FROM ORDERS").show()
   
    here are errors
    17/12/15 15:47:19 ERROR DeleteExecution$: Executor task launch worker-2 Multiple input rows matched for same row.
    17/12/15 15:47:20 AUDIT DeleteExecution$: [anubhav-Vostro-3559][anubhav][Thread-1]Delete data operation is failed for default.orders
    17/12/15 15:47:20 ERROR DeleteExecution$: main Delete data operation is failed due to failure in creating delete delta file for segment : null block : null


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sounakr commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    @anubhav100  The above errors are from Delete not from Select query. Please debug jira 1728 and 1731. They might have similar or different root causes.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    @sounakr both are failing for same cause that is Multiple input rows matched for same row.


---
12