[GitHub] [carbondata] Karan980 opened a new pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 opened a new pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox

Karan980 opened a new pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075


    ### Why is this PR needed?
   When a SDK written segment on which read is already performed once, is added through alter table add segment query to a carbon table, then select * query fails after adding it.
   
   In SDK segments the segmentId is the timestamp of the segment. When the SDK segment is read before adding, its indexes are stored in cache. Cache is a map of indexFilePath to Indexes. Now when the same segment is added to carbon table its segment ID is no longer the timestamp but the indexFilePath remains same as it is added externally. Now when we run select * query we get the indexes from the cache, but it is unable to map it to segment, because segment id changes.
   
    ### What changes were proposed in this PR?
   Also added segment Id to the key of cache map to make it more unique.
   
    ### Does this PR introduce any user interface change?
   -No
   
    ### Is any new testcase added?
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox

CarbonDataQA2 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758213921


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5296/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758216950


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3536/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758496406


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5298/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758497035


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3538/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

Karan980 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758510889


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758564733


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5302/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758565119


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3542/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

QiangCai commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-760630120


   do you have the reproduce steps?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

Karan980 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-760659123


   > do you have the reproduce steps?
   
   Yes, run this test case which i have added in this PR without any other changes from the PR. (Test add segment by carbon written by SDK on which read is already performed)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 edited a comment on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

Karan980 edited a comment on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-760659123


   > do you have the reproduce steps?
   
   Yes, run this test case which i have added in this PR without any other changes from the PR. (Test add segment by carbon written by SDK on which read is already performed). At last, in place of select count(*), just run select * query.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

QiangCai commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763286725


   please normalize the title, insert a blank after the close middle bracket ']'


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on pull request #4075: [CARBONDATA-4105] Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

Karan980 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763339371


   > please normalize the title, insert a blank after the close middle bracket ']'
   
   Done


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4075: [CARBONDATA-4105] Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#discussion_r560817709



##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/TableBlockIndexUniqueIdentifier.java
##########
@@ -45,7 +45,8 @@ public TableBlockIndexUniqueIdentifier(String indexFilePath, String indexFileNam
     this.indexFileName = indexFileName;
     this.mergeIndexFileName = mergeIndexFileName;
     this.segmentId = segmentId;
-    this.uniqueName = indexFilePath + CarbonCommonConstants.FILE_SEPARATOR + indexFileName;
+    this.uniqueName = segmentId + CarbonCommonConstants.UNDERSCORE +
+        indexFilePath + CarbonCommonConstants.FILE_SEPARATOR + indexFileName;

Review comment:
       indexFileName already has the segment id,
   
   SDK and spark are never meant to use in same JVM, Both are new process right ? why SDK query is going into cabron table LRU cache ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4075: [CARBONDATA-4105] Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#discussion_r560817709



##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/TableBlockIndexUniqueIdentifier.java
##########
@@ -45,7 +45,8 @@ public TableBlockIndexUniqueIdentifier(String indexFilePath, String indexFileNam
     this.indexFileName = indexFileName;
     this.mergeIndexFileName = mergeIndexFileName;
     this.segmentId = segmentId;
-    this.uniqueName = indexFilePath + CarbonCommonConstants.FILE_SEPARATOR + indexFileName;
+    this.uniqueName = segmentId + CarbonCommonConstants.UNDERSCORE +
+        indexFilePath + CarbonCommonConstants.FILE_SEPARATOR + indexFileName;

Review comment:
       indexFileName already has the segment id,
   
   SDK and spark are never meant to use in same JVM, Both are separate JVM process right ? why SDK query is going into cabron table LRU cache ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #4075: [CARBONDATA-4105] Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763503555


   @Karan980 : Here in a spark clsuter jvm itself you are running sdk read and write. Hence this problem.
   Ideally when you run SDK in cluster it will span up it's own JVM. so spark cluster will not be affected.
   I don't think this issue has to be fixed as it is not a user scenario.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #4075: [CARBONDATA-4105] Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

ajantha-bhat edited a comment on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763503555


   @Karan980 : Here in a spark cluster JVM itself you are running SDK read and write. Hence this problem.
   Ideally, when you run SDK in the cluster it will span up its own JVM. so spark cluster will not be affected.
   I don't think this issue has to be fixed as it is not a user scenario.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on pull request #4075: [CARBONDATA-4105] Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

Karan980 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763560080


   ok, closing this issue as cache will not be shared if both are different JVM processes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 closed pull request #4075: [CARBONDATA-4105] Select * query fails after a SDK written segment is added by alter table add segment query.

GitBox
In reply to this post by GitBox

Karan980 closed pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]