Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] Karan980 opened a new pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

Classic

List

19 messages Options

Options

GitBox

[GitHub] [carbondata] Karan980 opened a new pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

Karan980 opened a new pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075

### Why is this PR needed?
When a SDK written segment on which read is already performed once, is added through alter table add segment query to a carbon table, then select * query fails after adding it.

In SDK segments the segmentId is the timestamp of the segment. When the SDK segment is read before adding, its indexes are stored in cache. Cache is a map of indexFilePath to Indexes. Now when the same segment is added to carbon table its segment ID is no longer the timestamp but the indexFilePath remains same as it is added externally. Now when we run select * query we get the indexes from the cache, but it is unable to map it to segment, because segment id changes.

### What changes were proposed in this PR?
Also added segment Id to the key of cache map to make it more unique.

### Does this PR introduce any user interface change?
-No

### Is any new testcase added?
- Yes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

CarbonDataQA2 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758213921

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5296/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758216950

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3536/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758496406

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5298/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758497035

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3538/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Karan980 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

Karan980 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758510889

retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758564733

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5302/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-758565119

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3542/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] QiangCai commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

QiangCai commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-760630120

do you have the reproduce steps?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Karan980 commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

Karan980 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-760659123

> do you have the reproduce steps?

Yes, run this test case which i have added in this PR without any other changes from the PR. (Test add segment by carbon written by SDK on which read is already performed)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Karan980 edited a comment on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

Karan980 edited a comment on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-760659123

> do you have the reproduce steps?

Yes, run this test case which i have added in this PR without any other changes from the PR. (Test add segment by carbon written by SDK on which read is already performed). At last, in place of select count(*), just run select * query.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] QiangCai commented on pull request #4075: [CARBONDATA-4105]Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

QiangCai commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763286725

please normalize the title, insert a blank after the close middle bracket ']'

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Karan980 commented on pull request #4075: [CARBONDATA-4105] Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

Karan980 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763339371

> please normalize the title, insert a blank after the close middle bracket ']'

Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4075: [CARBONDATA-4105] Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#discussion_r560817709

##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/TableBlockIndexUniqueIdentifier.java
##########
@@ -45,7 +45,8 @@ public TableBlockIndexUniqueIdentifier(String indexFilePath, String indexFileNam
this.indexFileName = indexFileName;
this.mergeIndexFileName = mergeIndexFileName;
this.segmentId = segmentId;
- this.uniqueName = indexFilePath + CarbonCommonConstants.FILE_SEPARATOR + indexFileName;
+ this.uniqueName = segmentId + CarbonCommonConstants.UNDERSCORE +
+ indexFilePath + CarbonCommonConstants.FILE_SEPARATOR + indexFileName;

Review comment:
indexFileName already has the segment id,

SDK and spark are never meant to use in same JVM, Both are new process right ? why SDK query is going into cabron table LRU cache ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4075: [CARBONDATA-4105] Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#discussion_r560817709

##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/TableBlockIndexUniqueIdentifier.java
##########
@@ -45,7 +45,8 @@ public TableBlockIndexUniqueIdentifier(String indexFilePath, String indexFileNam
this.indexFileName = indexFileName;
this.mergeIndexFileName = mergeIndexFileName;
this.segmentId = segmentId;
- this.uniqueName = indexFilePath + CarbonCommonConstants.FILE_SEPARATOR + indexFileName;
+ this.uniqueName = segmentId + CarbonCommonConstants.UNDERSCORE +
+ indexFilePath + CarbonCommonConstants.FILE_SEPARATOR + indexFileName;

Review comment:
indexFileName already has the segment id,

SDK and spark are never meant to use in same JVM, Both are separate JVM process right ? why SDK query is going into cabron table LRU cache ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on pull request #4075: [CARBONDATA-4105] Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

ajantha-bhat commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763503555

@Karan980 : Here in a spark clsuter jvm itself you are running sdk read and write. Hence this problem.
Ideally when you run SDK in cluster it will span up it's own JVM. so spark cluster will not be affected.
I don't think this issue has to be fixed as it is not a user scenario.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #4075: [CARBONDATA-4105] Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

ajantha-bhat edited a comment on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763503555

@Karan980 : Here in a spark cluster JVM itself you are running SDK read and write. Hence this problem.
Ideally, when you run SDK in the cluster it will span up its own JVM. so spark cluster will not be affected.
I don't think this issue has to be fixed as it is not a user scenario.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Karan980 commented on pull request #4075: [CARBONDATA-4105] Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

Karan980 commented on pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075#issuecomment-763560080

ok, closing this issue as cache will not be shared if both are different JVM processes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Karan980 closed pull request #4075: [CARBONDATA-4105] Select * query fails after a SDK written segment is added by alter table add segment query.

In reply to this post by GitBox

Karan980 closed pull request #4075:
URL: https://github.com/apache/carbondata/pull/4075

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]