Karan980 opened a new pull request #4075: URL: https://github.com/apache/carbondata/pull/4075 ### Why is this PR needed? When a SDK written segment on which read is already performed once, is added through alter table add segment query to a carbon table, then select * query fails after adding it. In SDK segments the segmentId is the timestamp of the segment. When the SDK segment is read before adding, its indexes are stored in cache. Cache is a map of indexFilePath to Indexes. Now when the same segment is added to carbon table its segment ID is no longer the timestamp but the indexFilePath remains same as it is added externally. Now when we run select * query we get the indexes from the cache, but it is unable to map it to segment, because segment id changes. ### What changes were proposed in this PR? Also added segment Id to the key of cache map to make it more unique. ### Does this PR introduce any user interface change? -No ### Is any new testcase added? - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA2 commented on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758213921 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5296/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758216950 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3536/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758496406 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5298/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758497035 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3538/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758510889 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758564733 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5302/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758565119 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3542/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-760630120 do you have the reproduce steps? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-760659123 > do you have the reproduce steps? Yes, run this test case which i have added in this PR without any other changes from the PR. (Test add segment by carbon written by SDK on which read is already performed) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 edited a comment on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-760659123 > do you have the reproduce steps? Yes, run this test case which i have added in this PR without any other changes from the PR. (Test add segment by carbon written by SDK on which read is already performed). At last, in place of select count(*), just run select * query. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763286725 please normalize the title, insert a blank after the close middle bracket ']' ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763339371 > please normalize the title, insert a blank after the close middle bracket ']' Done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#discussion_r560817709 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/TableBlockIndexUniqueIdentifier.java ########## @@ -45,7 +45,8 @@ public TableBlockIndexUniqueIdentifier(String indexFilePath, String indexFileNam this.indexFileName = indexFileName; this.mergeIndexFileName = mergeIndexFileName; this.segmentId = segmentId; - this.uniqueName = indexFilePath + CarbonCommonConstants.FILE_SEPARATOR + indexFileName; + this.uniqueName = segmentId + CarbonCommonConstants.UNDERSCORE + + indexFilePath + CarbonCommonConstants.FILE_SEPARATOR + indexFileName; Review comment: indexFileName already has the segment id, SDK and spark are never meant to use in same JVM, Both are new process right ? why SDK query is going into cabron table LRU cache ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#discussion_r560817709 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/TableBlockIndexUniqueIdentifier.java ########## @@ -45,7 +45,8 @@ public TableBlockIndexUniqueIdentifier(String indexFilePath, String indexFileNam this.indexFileName = indexFileName; this.mergeIndexFileName = mergeIndexFileName; this.segmentId = segmentId; - this.uniqueName = indexFilePath + CarbonCommonConstants.FILE_SEPARATOR + indexFileName; + this.uniqueName = segmentId + CarbonCommonConstants.UNDERSCORE + + indexFilePath + CarbonCommonConstants.FILE_SEPARATOR + indexFileName; Review comment: indexFileName already has the segment id, SDK and spark are never meant to use in same JVM, Both are separate JVM process right ? why SDK query is going into cabron table LRU cache ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763503555 @Karan980 : Here in a spark clsuter jvm itself you are running sdk read and write. Hence this problem. Ideally when you run SDK in cluster it will span up it's own JVM. so spark cluster will not be affected. I don't think this issue has to be fixed as it is not a user scenario. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat edited a comment on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763503555 @Karan980 : Here in a spark cluster JVM itself you are running SDK read and write. Hence this problem. Ideally, when you run SDK in the cluster it will span up its own JVM. so spark cluster will not be affected. I don't think this issue has to be fixed as it is not a user scenario. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on pull request #4075: URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763560080 ok, closing this issue as cache will not be shared if both are different JVM processes. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 closed pull request #4075: URL: https://github.com/apache/carbondata/pull/4075 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |