akashrn5 commented on issue #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#issuecomment-610878673 @kunal642 @ajantha-bhat please review ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#discussion_r405437861 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ########## @@ -727,32 +730,54 @@ public TableSegmentRefresher getTableSegmentRefresher(CarbonTable table) { private Map<String, Boolean> manualSegmentRefresh = new HashMap<>(); TableSegmentRefresher(CarbonTable table) { - SegmentUpdateStatusManager statusManager = new SegmentUpdateStatusManager(table); - SegmentUpdateDetails[] updateStatusDetails = statusManager.getUpdateStatusDetails(); - for (SegmentUpdateDetails updateDetails : updateStatusDetails) { - UpdateVO updateVO = statusManager.getInvalidTimestampRange(updateDetails.getSegmentName()); + SegmentStatusManager segmentStatusManager = + new SegmentStatusManager(table.getAbsoluteTableIdentifier()); + List<Segment> validSegments; + try { + validSegments = segmentStatusManager.getValidAndInvalidSegments().getValidSegments(); + } catch (IOException e) { + LOGGER.error("Error while getting the valid segments.", e); + throw new RuntimeException(e); + } + for (Segment segment : validSegments) { + UpdateVO updateVO = + SegmentUpdateStatusManager.getInvalidTimestampRange(segment.getLoadMetadataDetails()); SegmentRefreshInfo segmentRefreshInfo; - if (updateVO != null && updateVO.getLatestUpdateTimestamp() != null) { - segmentRefreshInfo = new SegmentRefreshInfo(updateVO.getLatestUpdateTimestamp(), 0); + if (updateVO != null && updateVO.getLatestUpdateTimestamp() != null + || segment.getSegmentFileName() != null) { + long segmentFileTimeStamp = FileFactory.getCarbonFile(CarbonTablePath + .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())) + .getLastModifiedTime(); + segmentRefreshInfo = + new SegmentRefreshInfo(updateVO.getLatestUpdateTimestamp(), 0, segmentFileTimeStamp); } else { - segmentRefreshInfo = new SegmentRefreshInfo(0L, 0); + segmentRefreshInfo = new SegmentRefreshInfo(0L, 0, 0L); Review comment: So, old store (without having segment file). This cache clean up fail ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#discussion_r405444105 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ########## @@ -727,32 +730,54 @@ public TableSegmentRefresher getTableSegmentRefresher(CarbonTable table) { private Map<String, Boolean> manualSegmentRefresh = new HashMap<>(); TableSegmentRefresher(CarbonTable table) { - SegmentUpdateStatusManager statusManager = new SegmentUpdateStatusManager(table); - SegmentUpdateDetails[] updateStatusDetails = statusManager.getUpdateStatusDetails(); - for (SegmentUpdateDetails updateDetails : updateStatusDetails) { - UpdateVO updateVO = statusManager.getInvalidTimestampRange(updateDetails.getSegmentName()); + SegmentStatusManager segmentStatusManager = + new SegmentStatusManager(table.getAbsoluteTableIdentifier()); + List<Segment> validSegments; + try { + validSegments = segmentStatusManager.getValidAndInvalidSegments().getValidSegments(); + } catch (IOException e) { + LOGGER.error("Error while getting the valid segments.", e); + throw new RuntimeException(e); + } + for (Segment segment : validSegments) { + UpdateVO updateVO = + SegmentUpdateStatusManager.getInvalidTimestampRange(segment.getLoadMetadataDetails()); SegmentRefreshInfo segmentRefreshInfo; - if (updateVO != null && updateVO.getLatestUpdateTimestamp() != null) { - segmentRefreshInfo = new SegmentRefreshInfo(updateVO.getLatestUpdateTimestamp(), 0); + if (updateVO != null && updateVO.getLatestUpdateTimestamp() != null + || segment.getSegmentFileName() != null) { + long segmentFileTimeStamp = FileFactory.getCarbonFile(CarbonTablePath + .getSegmentFilePath(table.getTablePath(), segment.getSegmentFileName())) + .getLastModifiedTime(); + segmentRefreshInfo = + new SegmentRefreshInfo(updateVO.getLatestUpdateTimestamp(), 0, segmentFileTimeStamp); } else { - segmentRefreshInfo = new SegmentRefreshInfo(0L, 0); + segmentRefreshInfo = new SegmentRefreshInfo(0L, 0, 0L); Review comment: no it wont fail, basically this flow wonttake care of that, because for old store we operate based on directly index files, and find them in lrucache ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#discussion_r405472125 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentRefreshInfo.java ########## @@ -21,12 +21,17 @@ public class SegmentRefreshInfo implements Serializable { - private Long segmentUpdatedTimestamp; + private Long segmentUpdatedTimestamp = 0L; private Integer countOfFileInSegment; + private Long segmentFileTimestamp = 0L; - public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment) { - this.segmentUpdatedTimestamp = segmentUpdatedTimestamp; + public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment, + Long segmentFileTimestamp) { + if (segmentUpdatedTimestamp != null) { Review comment: In case of update scenario, If we fill `segmentUpdatedTimestamp` as `segmentFileTimestamp`, will there be a problem ? I think no need to have `segmentFileTimestamp` variable. We can just use `segmentUpdatedTimestamp` and fill this based on segment modified time ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#discussion_r405473431 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentRefreshInfo.java ########## @@ -21,12 +21,17 @@ public class SegmentRefreshInfo implements Serializable { - private Long segmentUpdatedTimestamp; + private Long segmentUpdatedTimestamp = 0L; private Integer countOfFileInSegment; + private Long segmentFileTimestamp = 0L; - public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment) { - this.segmentUpdatedTimestamp = segmentUpdatedTimestamp; + public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment, + Long segmentFileTimestamp) { + if (segmentUpdatedTimestamp != null) { Review comment: please analyze if this can work ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#discussion_r405479328 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentRefreshInfo.java ########## @@ -21,12 +21,17 @@ public class SegmentRefreshInfo implements Serializable { - private Long segmentUpdatedTimestamp; + private Long segmentUpdatedTimestamp = 0L; private Integer countOfFileInSegment; + private Long segmentFileTimestamp = 0L; - public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment) { - this.segmentUpdatedTimestamp = segmentUpdatedTimestamp; + public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment, + Long segmentFileTimestamp) { + if (segmentUpdatedTimestamp != null) { Review comment: 1. I didnt understand why we fill the update time as segment time? Can you please tell? you mean if developer sets by mistake or something else? 2. Are you saying to completely remove the segmentUpdateTimeStamp during refresh? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
kunal642 commented on a change in pull request #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#discussion_r405511048 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentRefreshInfo.java ########## @@ -21,12 +21,17 @@ public class SegmentRefreshInfo implements Serializable { - private Long segmentUpdatedTimestamp; + private Long segmentUpdatedTimestamp = 0L; private Integer countOfFileInSegment; + private Long segmentFileTimestamp = 0L; - public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment) { - this.segmentUpdatedTimestamp = segmentUpdatedTimestamp; + public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment, + Long segmentFileTimestamp) { + if (segmentUpdatedTimestamp != null) { Review comment: @ajantha-bhat Why will we segmentUpdatedTimestamp as segmentFileTimestamp?? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
kunal642 commented on a change in pull request #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#discussion_r405511048 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentRefreshInfo.java ########## @@ -21,12 +21,17 @@ public class SegmentRefreshInfo implements Serializable { - private Long segmentUpdatedTimestamp; + private Long segmentUpdatedTimestamp = 0L; private Integer countOfFileInSegment; + private Long segmentFileTimestamp = 0L; - public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment) { - this.segmentUpdatedTimestamp = segmentUpdatedTimestamp; + public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment, + Long segmentFileTimestamp) { + if (segmentUpdatedTimestamp != null) { Review comment: @ajantha-bhat Why would we fill segmentUpdatedTimestamp as segmentFileTimestamp?? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#discussion_r405517034 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentRefreshInfo.java ########## @@ -21,12 +21,17 @@ public class SegmentRefreshInfo implements Serializable { - private Long segmentUpdatedTimestamp; + private Long segmentUpdatedTimestamp = 0L; private Integer countOfFileInSegment; + private Long segmentFileTimestamp = 0L; - public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment) { - this.segmentUpdatedTimestamp = segmentUpdatedTimestamp; + public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment, + Long segmentFileTimestamp) { + if (segmentUpdatedTimestamp != null) { Review comment: @kunal642 , @akashrn5 : Isn't the segmentUpdatedTimestamp will be same as latest modified segment time. which is segmentFileTimestamp ? But old store, which doesn't have segmentFile update may not work If we do that. We cannot do this now. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on issue #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#issuecomment-610958448 LGTM, @kunal642 : you can check and merge ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#discussion_r405532176 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentRefreshInfo.java ########## @@ -21,12 +21,17 @@ public class SegmentRefreshInfo implements Serializable { - private Long segmentUpdatedTimestamp; + private Long segmentUpdatedTimestamp = 0L; private Integer countOfFileInSegment; + private Long segmentFileTimestamp = 0L; - public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment) { - this.segmentUpdatedTimestamp = segmentUpdatedTimestamp; + public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment, + Long segmentFileTimestamp) { + if (segmentUpdatedTimestamp != null) { Review comment: @ajantha-bhat you are right, first thing is old store case and `segmentUpdatedTimestamp` and `segmentFileStamp` wont be same because `segmentFileStamp` is file's last modified time, and `segmentUpdatedTimestamp` we actually set in the table status after update. So we can keep both for old and new store scenarios. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
kunal642 commented on a change in pull request #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#discussion_r405532556 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentRefreshInfo.java ########## @@ -21,12 +21,17 @@ public class SegmentRefreshInfo implements Serializable { - private Long segmentUpdatedTimestamp; + private Long segmentUpdatedTimestamp = 0L; private Integer countOfFileInSegment; + private Long segmentFileTimestamp = 0L; - public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment) { - this.segmentUpdatedTimestamp = segmentUpdatedTimestamp; + public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment, + Long segmentFileTimestamp) { + if (segmentUpdatedTimestamp != null) { Review comment: @ajantha-bhat We already have a check for segmentFile existance to handle compatibility. And incase of multiple drivers the driver which has old information this code will check the latest timestamp of the file with the updateVo to check whether any other driver has updated or not. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
kunal642 commented on a change in pull request #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#discussion_r405532556 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentRefreshInfo.java ########## @@ -21,12 +21,17 @@ public class SegmentRefreshInfo implements Serializable { - private Long segmentUpdatedTimestamp; + private Long segmentUpdatedTimestamp = 0L; private Integer countOfFileInSegment; + private Long segmentFileTimestamp = 0L; - public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment) { - this.segmentUpdatedTimestamp = segmentUpdatedTimestamp; + public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment, + Long segmentFileTimestamp) { + if (segmentUpdatedTimestamp != null) { Review comment: @ajantha-bhat We already have a check for segmentFile existance to handle compatibility. And incase of multiple drivers the driver which has old information this code will check the latest timestamp of the file with the updateVo to check whether any other driver has updated or not. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#discussion_r405541427 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentRefreshInfo.java ########## @@ -21,12 +21,17 @@ public class SegmentRefreshInfo implements Serializable { - private Long segmentUpdatedTimestamp; + private Long segmentUpdatedTimestamp = 0L; private Integer countOfFileInSegment; + private Long segmentFileTimestamp = 0L; - public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment) { - this.segmentUpdatedTimestamp = segmentUpdatedTimestamp; + public SegmentRefreshInfo(Long segmentUpdatedTimestamp, Integer countOfFileInSegment, + Long segmentFileTimestamp) { + if (segmentUpdatedTimestamp != null) { Review comment: ok ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
kunal642 commented on issue #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686#issuecomment-610973152 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
asfgit closed pull request #3686: [CARBONDATA-3759]Refactor segmentRefreshInfo and fix cache issue in multiple application scenario
URL: https://github.com/apache/carbondata/pull/3686 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |