Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] akashrn5 opened a new pull request #3927: [wip]fix issue

Classic

List

12 messages Options

Options

GitBox

[GitHub] [carbondata] akashrn5 opened a new pull request #3927: [wip]fix issue

akashrn5 opened a new pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927

### Why is this PR needed?

### What changes were proposed in this PR?

### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)

### Is any new testcase added?
- No
- Yes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3927: [wip]fix issue

CarbonDataQA1 commented on pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#issuecomment-691982389

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4065/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3927: [wip]fix issue

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#issuecomment-691991425

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2326/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#discussion_r489299596

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -725,43 +725,53 @@ object CarbonDataRDDFactory {
val metadataDetails =
SegmentStatusManager.readTableStatusFile(
CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath))
+ val updateTableStatusFile = CarbonUpdateUtil.getUpdateStatusFileName(updateModel
+ .updatedTimeStamp.toString)
+ val updatedSegments = SegmentUpdateStatusManager.readLoadMetadata(updateTableStatusFile,
+ carbonTable.getTablePath).map(_.getSegmentName).toSet
val segmentFiles = segmentDetails.asScala.map { seg =>
- val load =
- metadataDetails.find(_.getLoadName.equals(seg.getSegmentNo)).get
- val segmentFile = load.getSegmentFile
- var segmentFiles: Seq[CarbonFile] = Seq.empty[CarbonFile]
+ // create new segment files and merge for only updated segments

Review comment:
I thought you will strictly follow to clean up the base code in the area of modification to make it more readable and maintainable!

here `seg`, `file`, `carbonFile` are not good variable names, you can check and refactor if you want

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#discussion_r489300387

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -725,43 +725,53 @@ object CarbonDataRDDFactory {
val metadataDetails =
SegmentStatusManager.readTableStatusFile(
CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath))
+ val updateTableStatusFile = CarbonUpdateUtil.getUpdateStatusFileName(updateModel
+ .updatedTimeStamp.toString)
+ val updatedSegments = SegmentUpdateStatusManager.readLoadMetadata(updateTableStatusFile,
+ carbonTable.getTablePath).map(_.getSegmentName).toSet
val segmentFiles = segmentDetails.asScala.map { seg =>
- val load =
- metadataDetails.find(_.getLoadName.equals(seg.getSegmentNo)).get
- val segmentFile = load.getSegmentFile
- var segmentFiles: Seq[CarbonFile] = Seq.empty[CarbonFile]
+ // create new segment files and merge for only updated segments
+ if (updatedSegments.contains(seg.getSegmentNo)) {
+ val load =
+ metadataDetails.find(_.getLoadName.equals(seg.getSegmentNo)).get
+ val segmentFile = load.getSegmentFile
+ var segmentFiles: Seq[CarbonFile] = Seq.empty[CarbonFile]

- val segmentMetaDataInfo = segmentMetaDataInfoMap.get(seg.getSegmentNo)
- val file = SegmentFileStore.writeSegmentFile(
- carbonTable,
- seg.getSegmentNo,
- String.valueOf(System.currentTimeMillis()),
- load.getPath,
- segmentMetaDataInfo)
+ val segmentMetaDataInfo = segmentMetaDataInfoMap.get(seg.getSegmentNo)
+ val file = SegmentFileStore.writeSegmentFile(

Review comment:
here writing segment file again to just update the current timestamp ?
you can add comment If possible.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#discussion_r489299596

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -725,43 +725,53 @@ object CarbonDataRDDFactory {
val metadataDetails =
SegmentStatusManager.readTableStatusFile(
CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath))
+ val updateTableStatusFile = CarbonUpdateUtil.getUpdateStatusFileName(updateModel
+ .updatedTimeStamp.toString)
+ val updatedSegments = SegmentUpdateStatusManager.readLoadMetadata(updateTableStatusFile,
+ carbonTable.getTablePath).map(_.getSegmentName).toSet
val segmentFiles = segmentDetails.asScala.map { seg =>
- val load =
- metadataDetails.find(_.getLoadName.equals(seg.getSegmentNo)).get
- val segmentFile = load.getSegmentFile
- var segmentFiles: Seq[CarbonFile] = Seq.empty[CarbonFile]
+ // create new segment files and merge for only updated segments

Review comment:
here `seg`, `file`, `carbonFile` are not good variable names, you can check and refactor if you want

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

In reply to this post by GitBox

akashrn5 commented on a change in pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#discussion_r489323476

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -725,43 +725,53 @@ object CarbonDataRDDFactory {
val metadataDetails =
SegmentStatusManager.readTableStatusFile(
CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath))
+ val updateTableStatusFile = CarbonUpdateUtil.getUpdateStatusFileName(updateModel
+ .updatedTimeStamp.toString)
+ val updatedSegments = SegmentUpdateStatusManager.readLoadMetadata(updateTableStatusFile,
+ carbonTable.getTablePath).map(_.getSegmentName).toSet
val segmentFiles = segmentDetails.asScala.map { seg =>
- val load =
- metadataDetails.find(_.getLoadName.equals(seg.getSegmentNo)).get
- val segmentFile = load.getSegmentFile
- var segmentFiles: Seq[CarbonFile] = Seq.empty[CarbonFile]
+ // create new segment files and merge for only updated segments

Review comment:
changed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

In reply to this post by GitBox

akashrn5 commented on a change in pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#discussion_r489324392

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -725,43 +725,53 @@ object CarbonDataRDDFactory {
val metadataDetails =
SegmentStatusManager.readTableStatusFile(
CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath))
+ val updateTableStatusFile = CarbonUpdateUtil.getUpdateStatusFileName(updateModel
+ .updatedTimeStamp.toString)
+ val updatedSegments = SegmentUpdateStatusManager.readLoadMetadata(updateTableStatusFile,
+ carbonTable.getTablePath).map(_.getSegmentName).toSet
val segmentFiles = segmentDetails.asScala.map { seg =>
- val load =
- metadataDetails.find(_.getLoadName.equals(seg.getSegmentNo)).get
- val segmentFile = load.getSegmentFile
- var segmentFiles: Seq[CarbonFile] = Seq.empty[CarbonFile]
+ // create new segment files and merge for only updated segments
+ if (updatedSegments.contains(seg.getSegmentNo)) {
+ val load =
+ metadataDetails.find(_.getLoadName.equals(seg.getSegmentNo)).get
+ val segmentFile = load.getSegmentFile
+ var segmentFiles: Seq[CarbonFile] = Seq.empty[CarbonFile]

- val segmentMetaDataInfo = segmentMetaDataInfoMap.get(seg.getSegmentNo)
- val file = SegmentFileStore.writeSegmentFile(
- carbonTable,
- seg.getSegmentNo,
- String.valueOf(System.currentTimeMillis()),
- load.getPath,
- segmentMetaDataInfo)
+ val segmentMetaDataInfo = segmentMetaDataInfoMap.get(seg.getSegmentNo)
+ val file = SegmentFileStore.writeSegmentFile(

Review comment:
here method level comment says for new updates on the same segment we need to write new segment file with new segmentMetaDataInfo and merge with old segment file , i think that comment is pretty clear. It should be ok i think. Please check once.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

In reply to this post by GitBox

ajantha-bhat commented on pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#issuecomment-693310888

LGTM.

will merge once the build passes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#issuecomment-693362694

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4097/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#issuecomment-693363425

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2356/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] asfgit closed pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

In reply to this post by GitBox

asfgit closed pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]