GitHub user manishgupta88 opened a pull request:
https://github.com/apache/carbondata/pull/2868 [WIP] Improve drop table performance by reducing the namenode RPC calls during physical deletion of files **Problem** Current drop table command takes more than 1 minute to delete 3000 files during drop table operation from HDFS **Analysis** Even though we are using HDFS file system we are explicitly we are recursively iterating through the table folders and deleting each file. For each file deletion and file listing one rpc call is made to namenode. To delete 3000 files 3000 rpc calls are made to namenode for file deletion and few more rpc calls for file listing in each folder. **Solution** HDFS provides an API for deleting all folders and files recursively for a given path in a single RPC call. Use that API and improve the drop table operation performance. **Result:** After these code changes drop table operation time to delete 3000 files from HDFS has reduced from 1 minute to ~2 sec. - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done Verified on cluster - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/carbondata drop_table_slow Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2868.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2868 ---- commit f79f0fa351ed76cb74fe441f7d13cf756d49cb4c Author: manishgupta88 <tomanishgupta18@...> Date: 2018-10-29T06:09:09Z Modified code to improve the drop table command performance ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2868 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1102/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2868 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1313/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2868 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9365/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2868 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2868 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1115/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2868 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9377/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2868 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1325/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2868 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1118/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2868 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1330/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2868 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9382/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2868#discussion_r229146066 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/detailquery/SubqueryWithFilterAndSortTestCase.scala --- @@ -64,15 +67,14 @@ class SubqueryWithFilterAndSortTestCase extends QueryTest with BeforeAndAfterAll dis.close() } def deleteFile(filePath: String) { - val file = FileFactory.getCarbonFile(filePath, FileFactory.getFileType(filePath)) + val file = new File(filePath) --- End diff -- why is this modification needed? --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2868#discussion_r229173642 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/detailquery/SubqueryWithFilterAndSortTestCase.scala --- @@ -64,15 +67,14 @@ class SubqueryWithFilterAndSortTestCase extends QueryTest with BeforeAndAfterAll dis.close() } def deleteFile(filePath: String) { - val file = FileFactory.getCarbonFile(filePath, FileFactory.getFileType(filePath)) + val file = new File(filePath) --- End diff -- Not required. I will remove --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2868 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1134/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2868 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9398/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2868 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1346/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/2868 If the table is on S3, will it behave correctly since it does not have "folder" concept? --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2868#discussion_r229283056 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/CarbonFile.java --- @@ -62,6 +62,11 @@ boolean renameForce(String changetoName); + /** + * This method will delete the files recursively from file system + * + * @return --- End diff -- complete the comment --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2868#discussion_r229283191 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java --- @@ -141,7 +141,12 @@ public boolean renameTo(String changetoName) { } public boolean delete() { - return file.delete(); + try { + return deleteFile(file.getAbsolutePath(), FileFactory.getFileType(file.getAbsolutePath())); + } catch (IOException e) { + LOGGER.error("Exception occurred:" + e.getMessage()); --- End diff -- include the exception in the error log --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2868 > If the table is on S3, will it behave correctly since it does not have "folder" concept? I have not changed any existing behavior, so it should work fine --- |
Free forum by Nabble | Edit this page |