akashrn5 opened a new pull request #3907: URL: https://github.com/apache/carbondata/pull/3907 ### Why is this PR needed? During the carbondata reliability and concurrency test of load, compaction and query, Some times nullPointerException is thrown. This is because, in `TableSegmentRefresher ` we get the last modified timestamp of the segment file, to decide to refresh the cache, in case of concurrency, it can happen that the segment file get deleted or during update, file may not be there, that time getLastModified time throws null pointer. ### What changes were proposed in this PR? Before get the last modified time, always check for the file exists, as it can be deleted during that time due to concurrency, if not present, initialize to zero. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No(verified with concurrency of 1000s of segments. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA1 commented on pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#issuecomment-683675465 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3929/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#issuecomment-683683411 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2188/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#discussion_r481175661 ########## File path: core/src/main/java/org/apache/carbondata/core/index/IndexStoreManager.java ########## @@ -515,15 +516,16 @@ public TableSegmentRefresher getTableSegmentRefresher(CarbonTable table) { UpdateVO updateVO = SegmentUpdateStatusManager.getInvalidTimestampRange(segment.getLoadMetadataDetails()); SegmentRefreshInfo segmentRefreshInfo; - if ((updateVO != null && updateVO.getLatestUpdateTimestamp() != null) - || segment.getSegmentFileName() != null) { - long segmentFileTimeStamp; + if (updateVO.getLatestUpdateTimestamp() != null || segment.getSegmentFileName() != null) { + long segmentFileTimeStamp = 0L; if (null != segment.getLoadMetadataDetails()) { segmentFileTimeStamp = segment.getLoadMetadataDetails().getLastModifiedTime(); } else { - segmentFileTimeStamp = FileFactory.getCarbonFile(CarbonTablePath - .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())) - .getLastModifiedTime(); + CarbonFile segmentFile = FileFactory.getCarbonFile(CarbonTablePath + .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())); + if (segmentFile.exists()) { Review comment: 1) If segment object exist and segment fileName is not null, then that means segment was there. If somebody deletes the segment file manually. Then better to throw an exception instead of initializing to 0 ? 2) when the segment.getLoadMetadataDetails() will be empty ? 3) now in positive case also we check file exist ? in object storage file exist check can be costly. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#discussion_r481198186 ########## File path: core/src/main/java/org/apache/carbondata/core/index/IndexStoreManager.java ########## @@ -515,15 +516,16 @@ public TableSegmentRefresher getTableSegmentRefresher(CarbonTable table) { UpdateVO updateVO = SegmentUpdateStatusManager.getInvalidTimestampRange(segment.getLoadMetadataDetails()); SegmentRefreshInfo segmentRefreshInfo; - if ((updateVO != null && updateVO.getLatestUpdateTimestamp() != null) - || segment.getSegmentFileName() != null) { - long segmentFileTimeStamp; + if (updateVO.getLatestUpdateTimestamp() != null || segment.getSegmentFileName() != null) { + long segmentFileTimeStamp = 0L; if (null != segment.getLoadMetadataDetails()) { segmentFileTimeStamp = segment.getLoadMetadataDetails().getLastModifiedTime(); } else { - segmentFileTimeStamp = FileFactory.getCarbonFile(CarbonTablePath - .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())) - .getLastModifiedTime(); + CarbonFile segmentFile = FileFactory.getCarbonFile(CarbonTablePath + .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())); + if (segmentFile.exists()) { Review comment: 1. even though when the segment object exists, if concurrently operations are happening like like or compaction, you can consider SI also, then we will modify the segment file, which is basically overwrite, then that corner case it will fail, so in that case, above scenario will happen, so no need to throw exception. 2. can refer PR #3814 3. we cannot differentiate between positive case and negative case, in order to query or the operation to succeed, we should take these steps, which we follow all places in carbon. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#discussion_r481221931 ########## File path: core/src/main/java/org/apache/carbondata/core/index/IndexStoreManager.java ########## @@ -515,15 +516,16 @@ public TableSegmentRefresher getTableSegmentRefresher(CarbonTable table) { UpdateVO updateVO = SegmentUpdateStatusManager.getInvalidTimestampRange(segment.getLoadMetadataDetails()); SegmentRefreshInfo segmentRefreshInfo; - if ((updateVO != null && updateVO.getLatestUpdateTimestamp() != null) - || segment.getSegmentFileName() != null) { - long segmentFileTimeStamp; + if (updateVO.getLatestUpdateTimestamp() != null || segment.getSegmentFileName() != null) { + long segmentFileTimeStamp = 0L; if (null != segment.getLoadMetadataDetails()) { segmentFileTimeStamp = segment.getLoadMetadataDetails().getLastModifiedTime(); } else { - segmentFileTimeStamp = FileFactory.getCarbonFile(CarbonTablePath - .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())) - .getLastModifiedTime(); + CarbonFile segmentFile = FileFactory.getCarbonFile(CarbonTablePath + .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())); + if (segmentFile.exists()) { Review comment: > can refer PR #3814 Seems that PR is referring from load meta details, it doesn't mention when it is null > we cannot differentiate between positive case and negative case Agree, but instead of adding file exist check , may acquire segment lock (to make sure segment file there) or make sure load metadetails is filled always so it wont have to check file exist. Else many places we have to add file exist check where and all we are accessing the segment ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#discussion_r481221931 ########## File path: core/src/main/java/org/apache/carbondata/core/index/IndexStoreManager.java ########## @@ -515,15 +516,16 @@ public TableSegmentRefresher getTableSegmentRefresher(CarbonTable table) { UpdateVO updateVO = SegmentUpdateStatusManager.getInvalidTimestampRange(segment.getLoadMetadataDetails()); SegmentRefreshInfo segmentRefreshInfo; - if ((updateVO != null && updateVO.getLatestUpdateTimestamp() != null) - || segment.getSegmentFileName() != null) { - long segmentFileTimeStamp; + if (updateVO.getLatestUpdateTimestamp() != null || segment.getSegmentFileName() != null) { + long segmentFileTimeStamp = 0L; if (null != segment.getLoadMetadataDetails()) { segmentFileTimeStamp = segment.getLoadMetadataDetails().getLastModifiedTime(); } else { - segmentFileTimeStamp = FileFactory.getCarbonFile(CarbonTablePath - .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())) - .getLastModifiedTime(); + CarbonFile segmentFile = FileFactory.getCarbonFile(CarbonTablePath + .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())); + if (segmentFile.exists()) { Review comment: > can refer PR #3814 Seems that PR is referring from load meta details, it doesn't mention when it is null > we cannot differentiate between positive case and negative case Agree, but instead of adding file exist check , may acquire segment lock (to make sure segment file there) or make sure load metadetails is filled always so it wont have to check file exist. Else many places we have to add file exist check where and all we are accessing the segment ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#discussion_r481226211 ########## File path: core/src/main/java/org/apache/carbondata/core/index/IndexStoreManager.java ########## @@ -515,15 +516,16 @@ public TableSegmentRefresher getTableSegmentRefresher(CarbonTable table) { UpdateVO updateVO = SegmentUpdateStatusManager.getInvalidTimestampRange(segment.getLoadMetadataDetails()); SegmentRefreshInfo segmentRefreshInfo; - if ((updateVO != null && updateVO.getLatestUpdateTimestamp() != null) - || segment.getSegmentFileName() != null) { - long segmentFileTimeStamp; + if (updateVO.getLatestUpdateTimestamp() != null || segment.getSegmentFileName() != null) { + long segmentFileTimeStamp = 0L; if (null != segment.getLoadMetadataDetails()) { segmentFileTimeStamp = segment.getLoadMetadataDetails().getLastModifiedTime(); } else { - segmentFileTimeStamp = FileFactory.getCarbonFile(CarbonTablePath - .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())) - .getLastModifiedTime(); + CarbonFile segmentFile = FileFactory.getCarbonFile(CarbonTablePath + .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())); + if (segmentFile.exists()) { Review comment: > actually it depends on how the segment object is made, since we have many methods to do it, These all things can be avoided when the segment refactoring is done, as already discussions are going on. > actually we have segment lock for actual segments, not the segment file which is metadata, that is the problem, its not like table status file. So during segment refactor we can consider these things into consideration and design it to avoid all these things, But now for existing users it will create, issue so we can consider this change. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#discussion_r481226211 ########## File path: core/src/main/java/org/apache/carbondata/core/index/IndexStoreManager.java ########## @@ -515,15 +516,16 @@ public TableSegmentRefresher getTableSegmentRefresher(CarbonTable table) { UpdateVO updateVO = SegmentUpdateStatusManager.getInvalidTimestampRange(segment.getLoadMetadataDetails()); SegmentRefreshInfo segmentRefreshInfo; - if ((updateVO != null && updateVO.getLatestUpdateTimestamp() != null) - || segment.getSegmentFileName() != null) { - long segmentFileTimeStamp; + if (updateVO.getLatestUpdateTimestamp() != null || segment.getSegmentFileName() != null) { + long segmentFileTimeStamp = 0L; if (null != segment.getLoadMetadataDetails()) { segmentFileTimeStamp = segment.getLoadMetadataDetails().getLastModifiedTime(); } else { - segmentFileTimeStamp = FileFactory.getCarbonFile(CarbonTablePath - .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())) - .getLastModifiedTime(); + CarbonFile segmentFile = FileFactory.getCarbonFile(CarbonTablePath + .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())); + if (segmentFile.exists()) { Review comment: > Seems that PR is referring from load meta details, it doesn't mention when it is null actually it depends on how the segment object is made, since we have many methods to do it, These all things can be avoided when the segment refactoring is done, as already discussions are going on. >may acquire segment lock (to make sure segment file there) actually we have segment lock for actual segments, not the segment file which is metadata, that is the problem, its not like table status file. So during segment refactor we can consider these things into consideration and design it to avoid all these things, But now for existing users it will create, issue so we can consider this change. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#issuecomment-684934937 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#discussion_r481226211 ########## File path: core/src/main/java/org/apache/carbondata/core/index/IndexStoreManager.java ########## @@ -515,15 +516,16 @@ public TableSegmentRefresher getTableSegmentRefresher(CarbonTable table) { UpdateVO updateVO = SegmentUpdateStatusManager.getInvalidTimestampRange(segment.getLoadMetadataDetails()); SegmentRefreshInfo segmentRefreshInfo; - if ((updateVO != null && updateVO.getLatestUpdateTimestamp() != null) - || segment.getSegmentFileName() != null) { - long segmentFileTimeStamp; + if (updateVO.getLatestUpdateTimestamp() != null || segment.getSegmentFileName() != null) { + long segmentFileTimeStamp = 0L; if (null != segment.getLoadMetadataDetails()) { segmentFileTimeStamp = segment.getLoadMetadataDetails().getLastModifiedTime(); } else { - segmentFileTimeStamp = FileFactory.getCarbonFile(CarbonTablePath - .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())) - .getLastModifiedTime(); + CarbonFile segmentFile = FileFactory.getCarbonFile(CarbonTablePath + .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())); + if (segmentFile.exists()) { Review comment: > Seems that PR is referring from load meta details, it doesn't mention when it is null actually it depends on how the segment object is made, since we have many methods to do it, These all things can be avoided when the segment refactoring is done, as already discussions are going on. >may acquire segment lock (to make sure segment file there) actually we have segment lock for actual segments, not the segment file which is metadata, that is the problem, its not like table status file. So during segment refactor we can consider these things into consideration and design it to avoid all these things, But now for existing users it will create issue, so we can consider this change. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#issuecomment-684938297 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3954/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#issuecomment-684943246 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#issuecomment-685020692 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3956/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#issuecomment-685022558 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2216/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#discussion_r481755396 ########## File path: core/src/main/java/org/apache/carbondata/core/index/IndexStoreManager.java ########## @@ -515,15 +516,16 @@ public TableSegmentRefresher getTableSegmentRefresher(CarbonTable table) { UpdateVO updateVO = SegmentUpdateStatusManager.getInvalidTimestampRange(segment.getLoadMetadataDetails()); SegmentRefreshInfo segmentRefreshInfo; - if ((updateVO != null && updateVO.getLatestUpdateTimestamp() != null) - || segment.getSegmentFileName() != null) { - long segmentFileTimeStamp; + if (updateVO.getLatestUpdateTimestamp() != null || segment.getSegmentFileName() != null) { + long segmentFileTimeStamp = 0L; if (null != segment.getLoadMetadataDetails()) { segmentFileTimeStamp = segment.getLoadMetadataDetails().getLastModifiedTime(); } else { - segmentFileTimeStamp = FileFactory.getCarbonFile(CarbonTablePath - .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())) - .getLastModifiedTime(); + CarbonFile segmentFile = FileFactory.getCarbonFile(CarbonTablePath + .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())); + if (segmentFile.exists()) { Review comment: In some customer scenarios, we have observed huge segment count (more than 100K), so here file exist check for all the valid segment not even pruned segment. so I feel this will slow down the query. so, I suggest to analyze and make flow to enter `null != segment.getLoadMetadataDetails()` block. I agree that segment refactor might solve this. It might take few months to finish and we cannot slow down queries till then also. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#issuecomment-685325305 @ajantha-bhat with recent changes, we are taking modified time from metadata details, so for metadata details already null check is present in getValidSegments API, so may be this changes doesnt affect, but still we need to test , and may be handle in segment refactor, but for now, i close this PR ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 closed pull request #3907: URL: https://github.com/apache/carbondata/pull/3907 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#discussion_r481784694 ########## File path: core/src/main/java/org/apache/carbondata/core/index/IndexStoreManager.java ########## @@ -515,15 +516,16 @@ public TableSegmentRefresher getTableSegmentRefresher(CarbonTable table) { UpdateVO updateVO = SegmentUpdateStatusManager.getInvalidTimestampRange(segment.getLoadMetadataDetails()); SegmentRefreshInfo segmentRefreshInfo; - if ((updateVO != null && updateVO.getLatestUpdateTimestamp() != null) - || segment.getSegmentFileName() != null) { - long segmentFileTimeStamp; + if (updateVO.getLatestUpdateTimestamp() != null || segment.getSegmentFileName() != null) { + long segmentFileTimeStamp = 0L; if (null != segment.getLoadMetadataDetails()) { segmentFileTimeStamp = segment.getLoadMetadataDetails().getLastModifiedTime(); } else { - segmentFileTimeStamp = FileFactory.getCarbonFile(CarbonTablePath - .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())) - .getLastModifiedTime(); + CarbonFile segmentFile = FileFactory.getCarbonFile(CarbonTablePath + .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())); + if (segmentFile.exists()) { Review comment: i agree, but just because we need query fast, we cant compromise to make query fail also, this is my view. I have already replied and closed PR, you can check. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#discussion_r481824058 ########## File path: core/src/main/java/org/apache/carbondata/core/index/IndexStoreManager.java ########## @@ -515,15 +516,16 @@ public TableSegmentRefresher getTableSegmentRefresher(CarbonTable table) { UpdateVO updateVO = SegmentUpdateStatusManager.getInvalidTimestampRange(segment.getLoadMetadataDetails()); SegmentRefreshInfo segmentRefreshInfo; - if ((updateVO != null && updateVO.getLatestUpdateTimestamp() != null) - || segment.getSegmentFileName() != null) { - long segmentFileTimeStamp; + if (updateVO.getLatestUpdateTimestamp() != null || segment.getSegmentFileName() != null) { + long segmentFileTimeStamp = 0L; if (null != segment.getLoadMetadataDetails()) { segmentFileTimeStamp = segment.getLoadMetadataDetails().getLastModifiedTime(); } else { - segmentFileTimeStamp = FileFactory.getCarbonFile(CarbonTablePath - .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())) - .getLastModifiedTime(); + CarbonFile segmentFile = FileFactory.getCarbonFile(CarbonTablePath + .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())); + if (segmentFile.exists()) { Review comment: I never mentioned on compromising to make query fail also. I wanted to analyze more and send it to the correct branch of the code to avoid adding new that which can slow down the query. As you analyzed now. these changes are not required as it is entering the correct block itself. Thanks ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |