[GitHub] carbondata pull request #2868: [WIP] Improve drop table performance by reduc...

classic Classic list List threaded Threaded
36 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2868: [WIP] Improve drop table performance by reduc...

qiuchenjian-2
GitHub user manishgupta88 opened a pull request:

    https://github.com/apache/carbondata/pull/2868

    [WIP] Improve drop table performance by reducing the namenode RPC calls during physical deletion of files

    **Problem**
    Current drop table command takes more than 1 minute to delete 3000 files during drop table operation from HDFS
   
    **Analysis**
    Even though we are using HDFS file system we are explicitly we are recursively iterating through the table folders and deleting each file. For each file deletion and file listing one rpc call is made to namenode. To delete 3000 files 3000 rpc calls are made to namenode for file deletion and few more rpc calls for file listing in each folder.
   
    **Solution**
    HDFS provides an API for deleting all folders and files recursively for a given path in a single RPC call. Use that API and improve the drop table operation performance.
   
    **Result:** After these code changes drop table operation time to delete 3000 files from HDFS has reduced from 1 minute to ~2 sec.
   
     - [ ] Any interfaces changed?
     No
     - [ ] Any backward compatibility impacted?
     No
     - [ ] Document update required?
    No
     - [ ] Testing done
    Verified on cluster      
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishgupta88/carbondata drop_table_slow

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2868.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2868
   
----
commit f79f0fa351ed76cb74fe441f7d13cf756d49cb4c
Author: manishgupta88 <tomanishgupta18@...>
Date:   2018-10-29T06:09:09Z

    Modified code to improve the drop table command performance

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [WIP] Improve drop table performance by reducing the...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1102/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [WIP] Improve drop table performance by reducing the...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1313/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [WIP] Improve drop table performance by reducing the...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9365/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1115/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9377/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1325/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1118/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1330/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9382/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2868: [CARBONDATA-3052] Improve drop table performa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2868#discussion_r229146066
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/detailquery/SubqueryWithFilterAndSortTestCase.scala ---
    @@ -64,15 +67,14 @@ class SubqueryWithFilterAndSortTestCase extends QueryTest with BeforeAndAfterAll
         dis.close()
       }
       def deleteFile(filePath: String) {
    -    val file = FileFactory.getCarbonFile(filePath, FileFactory.getFileType(filePath))
    +    val file = new File(filePath)
    --- End diff --
   
    why is this modification needed?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2868: [CARBONDATA-3052] Improve drop table performa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2868#discussion_r229173642
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/detailquery/SubqueryWithFilterAndSortTestCase.scala ---
    @@ -64,15 +67,14 @@ class SubqueryWithFilterAndSortTestCase extends QueryTest with BeforeAndAfterAll
         dis.close()
       }
       def deleteFile(filePath: String) {
    -    val file = FileFactory.getCarbonFile(filePath, FileFactory.getFileType(filePath))
    +    val file = new File(filePath)
    --- End diff --
   
    Not required. I will remove


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1134/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9398/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1346/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    If the table is on S3, will it behave correctly since it does not have "folder" concept?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2868: [CARBONDATA-3052] Improve drop table performa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2868#discussion_r229283056
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/CarbonFile.java ---
    @@ -62,6 +62,11 @@
     
       boolean renameForce(String changetoName);
     
    +  /**
    +   * This method will delete the files recursively from file system
    +   *
    +   * @return
    --- End diff --
   
    complete the comment


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2868: [CARBONDATA-3052] Improve drop table performa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2868#discussion_r229283191
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java ---
    @@ -141,7 +141,12 @@ public boolean renameTo(String changetoName) {
       }
     
       public boolean delete() {
    -    return file.delete();
    +    try {
    +      return deleteFile(file.getAbsolutePath(), FileFactory.getFileType(file.getAbsolutePath()));
    +    } catch (IOException e) {
    +      LOGGER.error("Exception occurred:" + e.getMessage());
    --- End diff --
   
    include the exception in the error log


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/2868
 
    > If the table is on S3, will it behave correctly since it does not have "folder" concept?
   
    I have not changed any existing behavior, so it should work fine


---
12