vikramahuja1001 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-766562961 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-766551160 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-768075252 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3597/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-768075845 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5357/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-768131508 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-768188868 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5360/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-768191257 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3600/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ydvpankaj99 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-768880622 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-768933560 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5370/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-768938250 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3609/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-772241927 @QiangCai @ajantha-bhat @akashrn5 please review ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-777597046 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3317/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-777598712 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5075/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r570107627 ########## File path: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java ########## @@ -297,6 +297,10 @@ public static boolean deleteFile(String filePath) throws IOException { return getCarbonFile(filePath).deleteFile(); } + public static boolean deleteFile(CarbonFile carbonFile) throws IOException { Review comment: already delete API is there, please use the same ########## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ########## @@ -1126,25 +1130,36 @@ public static void deleteSegmentFile(String tablePath, Segment segment) throws E * @param partitionSpecs * @throws IOException */ - public static void deleteSegment(String tablePath, Segment segment, + public static long deleteSegment(String tablePath, Segment segment, List<PartitionSpec> partitionSpecs, SegmentUpdateStatusManager updateStatusManager) throws Exception { SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName()); List<String> indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true, FileFactory.getConfiguration()); + long sizeFreed = 0; Map<String, List<String>> indexFilesMap = fileStore.getIndexFilesMap(); for (Map.Entry<String, List<String>> entry : indexFilesMap.entrySet()) { - FileFactory.deleteFile(entry.getKey()); + CarbonFile entryCarbonFile = FileFactory.getCarbonFile(entry.getKey()); + sizeFreed += entryCarbonFile.getSize(); + FileFactory.deleteFile(entryCarbonFile); + LOGGER.info("File deleted after clean files operation: " + entry.getKey()); for (String file : entry.getValue()) { String[] deltaFilePaths = updateStatusManager.getDeleteDeltaFilePath(file, segment.getSegmentNo()); for (String deltaFilePath : deltaFilePaths) { - FileFactory.deleteFile(deltaFilePath); + CarbonFile deltaCarbonFile = FileFactory.getCarbonFile(deltaFilePath); + sizeFreed += deltaCarbonFile.getSize(); + FileFactory.deleteFile(deltaCarbonFile); + LOGGER.info("File deleted after clean files operation: " + deltaFilePath); } - FileFactory.deleteFile(file); + CarbonFile deleteCarbonFile = FileFactory.getCarbonFile(file); + sizeFreed += deleteCarbonFile.getSize(); + FileFactory.deleteFile(deleteCarbonFile); Review comment: i can see lot of same code of lines here, may be you can refactor to a method like `getSizeAndDelete()` ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFilesCommandPartitionTable.scala ########## @@ -65,14 +65,19 @@ class TestCleanFilesCommandPartitionTable extends QueryTest with BeforeAndAfterA loadData() sql(s"""ALTER TABLE CLEANTEST COMPACT "MINOR" """) loadData() + sql(s"CLEAN FILES FOR TABLE cleantest DRYRUN").show() + sql(s"CLEAN FILES FOR TABLE cleantest").show() Review comment: .show is a waste call in FTs, please have a proper asserts for all ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1297,4 +1356,37 @@ public static TableStatusReturnTuple separateVisibleAndInvisibleSegments( return new HashMap<>(0); } } + + public static long partitionTableSegmentSize(CarbonTable carbonTable, LoadMetadataDetails + oneLoad, LoadMetadataDetails[] loadMetadataDetails, List<PartitionSpec> + partitionSpecs) throws Exception { + long size = 0; + SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), oneLoad + .getSegmentFile()); + List<String> indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true, + FileFactory.getConfiguration()); + Map<String, List<String>> indexFilesMap = fileStore.getIndexFilesMap(); Review comment: can't we calculate size together for all segments?, because here everytime its calling readindex files, and calling File APIs to calculate sizes, so in case of OBS it will be very slow, better to make it optimized to calculate one time i feel ########## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ########## @@ -1126,25 +1130,36 @@ public static void deleteSegmentFile(String tablePath, Segment segment) throws E * @param partitionSpecs * @throws IOException */ - public static void deleteSegment(String tablePath, Segment segment, + public static long deleteSegment(String tablePath, Segment segment, List<PartitionSpec> partitionSpecs, SegmentUpdateStatusManager updateStatusManager) throws Exception { SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName()); List<String> indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true, FileFactory.getConfiguration()); + long sizeFreed = 0; Map<String, List<String>> indexFilesMap = fileStore.getIndexFilesMap(); for (Map.Entry<String, List<String>> entry : indexFilesMap.entrySet()) { - FileFactory.deleteFile(entry.getKey()); + CarbonFile entryCarbonFile = FileFactory.getCarbonFile(entry.getKey()); + sizeFreed += entryCarbonFile.getSize(); Review comment: `entryCarbonFile ` please give a meaningful name like indexfile, datafile etc ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1072,7 +1097,22 @@ public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean isUpdateRequired(isForceDeletion, carbonTable, identifier, details, cleanStaleInprogress); if (!tuple2.isUpdateRequired) { - return; + try { + for (LoadMetadataDetails oneLoad : details) { + if (isExpiredSegment(oneLoad, carbonTable.getAbsoluteTableIdentifier())) { + if (!carbonTable.isHivePartitionTable()) { + trashSizeRemaining += FileFactory.getDirectorySize(CarbonTablePath + .getSegmentPath(carbonTable.getTablePath(), oneLoad.getLoadName())); + } else { + trashSizeRemaining += partitionTableSegmentSize(carbonTable, oneLoad, + details, partitionSpecs); + } + } + } + } catch (Exception e) { + LOG.error("Unable to calculate size of garbage data", e); + } + return new long[]{sizeFreed, trashSizeRemaining}; Review comment: dry run is meant to give the size, do you need to give empty size or fail it? if you give empty its purpose itself not completed. ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1072,7 +1097,22 @@ public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean isUpdateRequired(isForceDeletion, carbonTable, identifier, details, cleanStaleInprogress); if (!tuple2.isUpdateRequired) { - return; + try { + for (LoadMetadataDetails oneLoad : details) { + if (isExpiredSegment(oneLoad, carbonTable.getAbsoluteTableIdentifier())) { + if (!carbonTable.isHivePartitionTable()) { Review comment: why putting negation and getting confusion, just remove negation, swap if and else block code, looks simple right :) ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1125,13 +1165,32 @@ public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean CarbonLockUtil.fileUnlock(carbonTableStatusLock, LockUsage.TABLE_STATUS_LOCK); } if (updateCompletionStatus) { - DeleteLoadFolders + long[] cleanFileSizeFreed = DeleteLoadFolders .physicalFactAndMeasureMetadataDeletion(carbonTable, newAddedLoadHistoryList, isForceDeletion, partitionSpecs, cleanStaleInprogress); + sizeFreed += cleanFileSizeFreed[0]; + trashSizeRemaining += cleanFileSizeFreed[1]; + } + } + } else { + try { + for (LoadMetadataDetails oneLoad : metadataDetails) { + if (isExpiredSegment(oneLoad, carbonTable.getAbsoluteTableIdentifier())) { + if (!carbonTable.isHivePartitionTable()) { + trashSizeRemaining += FileFactory.getDirectorySize(CarbonTablePath.getSegmentPath( + carbonTable.getTablePath(), oneLoad.getLoadName())); + } else { + trashSizeRemaining += partitionTableSegmentSize(carbonTable, oneLoad, + metadataDetails, partitionSpecs); + } + } } + } catch (Exception e) { + LOG.error("Unable to calculate size of garbage data", e); } } } + return new long[]{sizeFreed, trashSizeRemaining}; Review comment: same as above, just my opinion, what you guys think @ajantha-bhat @QiangCai ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala ########## @@ -37,11 +38,24 @@ case class CarbonCleanFilesCommand( databaseNameOp: Option[String], tableName: String, options: Map[String, String] = Map.empty, + dryRun: Boolean, isInternalCleanCall: Boolean = false) extends DataCommand { val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName) + override def output: Seq[AttributeReference] = { + if (dryRun) { + Seq( + AttributeReference("Size that will be Freed", LongType, nullable = false)(), Review comment: better to give a more simple and meaningful name ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-779300995 @QiangCai @ajantha-bhat please review this ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r576564856 ########## File path: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java ########## @@ -297,6 +297,10 @@ public static boolean deleteFile(String filePath) throws IOException { return getCarbonFile(filePath).deleteFile(); } + public static boolean deleteFile(CarbonFile carbonFile) throws IOException { Review comment: Yes, it is there but it tries to get the carbonFile first, in case there is carbonFile already present in the parent method, it will again get it, which can be avoided and is more optimised. There is even a JIRA opened by Ajantha to do the exact same thing, so that we reduce unnecessary calls to getCarbonFIle. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r576568152 ########## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ########## @@ -1126,25 +1130,36 @@ public static void deleteSegmentFile(String tablePath, Segment segment) throws E * @param partitionSpecs * @throws IOException */ - public static void deleteSegment(String tablePath, Segment segment, + public static long deleteSegment(String tablePath, Segment segment, List<PartitionSpec> partitionSpecs, SegmentUpdateStatusManager updateStatusManager) throws Exception { SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName()); List<String> indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true, FileFactory.getConfiguration()); + long sizeFreed = 0; Map<String, List<String>> indexFilesMap = fileStore.getIndexFilesMap(); for (Map.Entry<String, List<String>> entry : indexFilesMap.entrySet()) { - FileFactory.deleteFile(entry.getKey()); + CarbonFile entryCarbonFile = FileFactory.getCarbonFile(entry.getKey()); + sizeFreed += entryCarbonFile.getSize(); Review comment: done ########## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ########## @@ -1126,25 +1130,36 @@ public static void deleteSegmentFile(String tablePath, Segment segment) throws E * @param partitionSpecs * @throws IOException */ - public static void deleteSegment(String tablePath, Segment segment, + public static long deleteSegment(String tablePath, Segment segment, List<PartitionSpec> partitionSpecs, SegmentUpdateStatusManager updateStatusManager) throws Exception { SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName()); List<String> indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true, FileFactory.getConfiguration()); + long sizeFreed = 0; Map<String, List<String>> indexFilesMap = fileStore.getIndexFilesMap(); for (Map.Entry<String, List<String>> entry : indexFilesMap.entrySet()) { - FileFactory.deleteFile(entry.getKey()); + CarbonFile entryCarbonFile = FileFactory.getCarbonFile(entry.getKey()); + sizeFreed += entryCarbonFile.getSize(); + FileFactory.deleteFile(entryCarbonFile); + LOGGER.info("File deleted after clean files operation: " + entry.getKey()); for (String file : entry.getValue()) { String[] deltaFilePaths = updateStatusManager.getDeleteDeltaFilePath(file, segment.getSegmentNo()); for (String deltaFilePath : deltaFilePaths) { - FileFactory.deleteFile(deltaFilePath); + CarbonFile deltaCarbonFile = FileFactory.getCarbonFile(deltaFilePath); + sizeFreed += deltaCarbonFile.getSize(); + FileFactory.deleteFile(deltaCarbonFile); + LOGGER.info("File deleted after clean files operation: " + deltaFilePath); } - FileFactory.deleteFile(file); + CarbonFile deleteCarbonFile = FileFactory.getCarbonFile(file); + sizeFreed += deleteCarbonFile.getSize(); + FileFactory.deleteFile(deleteCarbonFile); Review comment: yes, good idea. Changed it ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r576568703 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1072,7 +1097,22 @@ public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean isUpdateRequired(isForceDeletion, carbonTable, identifier, details, cleanStaleInprogress); if (!tuple2.isUpdateRequired) { - return; + try { + for (LoadMetadataDetails oneLoad : details) { + if (isExpiredSegment(oneLoad, carbonTable.getAbsoluteTableIdentifier())) { + if (!carbonTable.isHivePartitionTable()) { Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r576569850 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1072,7 +1097,22 @@ public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean isUpdateRequired(isForceDeletion, carbonTable, identifier, details, cleanStaleInprogress); if (!tuple2.isUpdateRequired) { - return; + try { + for (LoadMetadataDetails oneLoad : details) { + if (isExpiredSegment(oneLoad, carbonTable.getAbsoluteTableIdentifier())) { + if (!carbonTable.isHivePartitionTable()) { + trashSizeRemaining += FileFactory.getDirectorySize(CarbonTablePath + .getSegmentPath(carbonTable.getTablePath(), oneLoad.getLoadName())); + } else { + trashSizeRemaining += partitionTableSegmentSize(carbonTable, oneLoad, + details, partitionSpecs); + } + } + } + } catch (Exception e) { + LOG.error("Unable to calculate size of garbage data", e); + } + return new long[]{sizeFreed, trashSizeRemaining}; Review comment: we will just print the exception and carry on with the clean files command, not failing the whole thing, in case there is some error while calculating the size, i guess it will return 0 then, or in case when nothing is deleted from clean files operation, then also we sill return the size freed as 0. Can have a discussion on this and change accrodingly. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r576574599 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1297,4 +1356,37 @@ public static TableStatusReturnTuple separateVisibleAndInvisibleSegments( return new HashMap<>(0); } } + + public static long partitionTableSegmentSize(CarbonTable carbonTable, LoadMetadataDetails + oneLoad, LoadMetadataDetails[] loadMetadataDetails, List<PartitionSpec> + partitionSpecs) throws Exception { + long size = 0; + SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), oneLoad + .getSegmentFile()); + List<String> indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true, + FileFactory.getConfiguration()); + Map<String, List<String>> indexFilesMap = fileStore.getIndexFilesMap(); Review comment: This is just in the case of partition table, in non partition flow, we can just directly calculate the segment folder size ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |