Karan980 opened a new pull request #4048: URL: https://github.com/apache/carbondata/pull/4048 ### Why is this PR needed? Clean files command is not deleting .segment files present at tablePath/Metadata/Segments for the compacted/marked for delete segments added through alter table add segment query. ### What changes were proposed in this PR? Added code for deleting the segment file for the segments added through alter table add segment query. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA2 commented on pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#issuecomment-740595996 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5113/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#issuecomment-740596559 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3351/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#issuecomment-740599188 rebase with latest code and add your test cases to TestCleanFilesCommand class ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#issuecomment-740630288 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3354/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#issuecomment-740630899 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5116/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#issuecomment-741542607 > rebase with latest code and add your test cases to TestCleanFilesCommand class Code rebase done. I'm using some util function calls inside test cases which are present in current class. So, its better to keep test cases here. Also the issue in coming in segments added through alter table add segment query and all the test cases of this feature are present inside this class. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#issuecomment-741584943 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5121/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#issuecomment-741585579 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3359/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#discussion_r539979370 ########## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ########## @@ -1125,19 +1125,21 @@ public static void deleteSegment(String tablePath, Segment segment, SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName()); List<String> indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true, FileFactory.getConfiguration()); - Map<String, List<String>> indexFilesMap = fileStore.getIndexFilesMap(); Review comment: If the segment is an external segment, then in that case in the segment file of the segment isRelative is false, i think it's better to check and skip it this way. What do you suggest @QiangCai ? @ajantha-bhat ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#discussion_r539979831 ########## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ########## @@ -184,11 +184,7 @@ private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad, private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails oneLoad, Review comment: this function is calling another function directly, you can remove this function ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#discussion_r539982572 ########## File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java ########## @@ -597,6 +599,27 @@ public static String getSegmentIdFromPath(String dataFileAbsolutePath) { } } + /** Review comment: You can use getSegmentNumberFromSegmentFile method, instead of this. Please delete this. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#discussion_r539983758 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala ########## @@ -962,6 +962,88 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll { sql(s"drop table $tableName") } Review comment: I don't understand why these test cases cannot be added in the testCleanFilesCommand cases. You can any necessary function that you want to use there. Also please merge these 2 test cases into 1. And also Include Inprogress segment also. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#discussion_r539984002 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala ########## @@ -962,6 +962,88 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll { sql(s"drop table $tableName") } Review comment: And also add similar test case with partition table too. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#discussion_r539984002 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala ########## @@ -962,6 +962,88 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll { sql(s"drop table $tableName") } Review comment: And also add similar test case with partition table too, also try to have multiple formats instead of just 'carbon' ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#discussion_r539985886 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala ########## @@ -962,6 +962,88 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll { sql(s"drop table $tableName") } + test("Test clean files on segments after compaction") { + CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_CLEAN_FILES_FORCE_ALLOWED, "true") + createCarbonTable() + sql( + s"""LOAD DATA local inpath '$resourcesPath/data.csv' + | INTO TABLE addsegment1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"')""".stripMargin) + val table = CarbonEnv.getCarbonTable(None, "addsegment1") (sqlContext.sparkSession) + val path = CarbonTablePath.getSegmentPath(table.getTablePath, "1") + val newPath = storeLocation + "/" + "addsegtest" + for (i <- 0 until 6) { Review comment: instead of doing this, you can create a new table and add segments from this table ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on a change in pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#discussion_r540867380 ########## File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java ########## @@ -597,6 +599,27 @@ public static String getSegmentIdFromPath(String dataFileAbsolutePath) { } } + /** Review comment: getSegmentNumberFromSegmentFile method is returning segmentId from segment file name. But i need segmentId from segmentFile absolute path. I have changed my method to make call to this method internally. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on a change in pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#discussion_r542119343 ########## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ########## @@ -1125,19 +1125,21 @@ public static void deleteSegment(String tablePath, Segment segment, SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName()); List<String> indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true, FileFactory.getConfiguration()); - Map<String, List<String>> indexFilesMap = fileStore.getIndexFilesMap(); Review comment: I think this check is fine. Because for setting the isRelative variable same logic is used in FolderDetails class. If i use isRelative variable here, i have to generate the keys for locationMap because this variable is present inside loactionMap. ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala ########## @@ -962,6 +962,88 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll { sql(s"drop table $tableName") } Review comment: TestCases moved to testCleanFilesCommand and also added testCases for partition table and mixed formats. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on a change in pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#discussion_r542153838 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala ########## @@ -962,6 +962,88 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll { sql(s"drop table $tableName") } + test("Test clean files on segments after compaction") { + CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_CLEAN_FILES_FORCE_ALLOWED, "true") + createCarbonTable() + sql( + s"""LOAD DATA local inpath '$resourcesPath/data.csv' + | INTO TABLE addsegment1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"')""".stripMargin) + val table = CarbonEnv.getCarbonTable(None, "addsegment1") (sqlContext.sparkSession) + val path = CarbonTablePath.getSegmentPath(table.getTablePath, "1") + val newPath = storeLocation + "/" + "addsegtest" + for (i <- 0 until 6) { Review comment: I don't understand the problem with this approach. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4048: URL: https://github.com/apache/carbondata/pull/4048#issuecomment-744271698 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5155/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |