[GitHub] [carbondata] akashrn5 opened a new pull request #3927: [wip]fix issue

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 opened a new pull request #3927: [wip]fix issue

GitBox

akashrn5 opened a new pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927


    ### Why is this PR needed?
   
   
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3927: [wip]fix issue

GitBox

CarbonDataQA1 commented on pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#issuecomment-691982389


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4065/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3927: [wip]fix issue

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#issuecomment-691991425


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2326/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#discussion_r489299596



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -725,43 +725,53 @@ object CarbonDataRDDFactory {
     val metadataDetails =
       SegmentStatusManager.readTableStatusFile(
         CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath))
+    val updateTableStatusFile = CarbonUpdateUtil.getUpdateStatusFileName(updateModel
+      .updatedTimeStamp.toString)
+    val updatedSegments = SegmentUpdateStatusManager.readLoadMetadata(updateTableStatusFile,
+      carbonTable.getTablePath).map(_.getSegmentName).toSet
     val segmentFiles = segmentDetails.asScala.map { seg =>
-      val load =
-        metadataDetails.find(_.getLoadName.equals(seg.getSegmentNo)).get
-      val segmentFile = load.getSegmentFile
-      var segmentFiles: Seq[CarbonFile] = Seq.empty[CarbonFile]
+      // create new segment files and merge for only updated segments

Review comment:
       I thought you will strictly follow to clean up the base code in the area of modification to make it more readable and maintainable!
   
   here `seg`, `file`, `carbonFile` are not good variable names, you can check and refactor if you want




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#discussion_r489300387



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -725,43 +725,53 @@ object CarbonDataRDDFactory {
     val metadataDetails =
       SegmentStatusManager.readTableStatusFile(
         CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath))
+    val updateTableStatusFile = CarbonUpdateUtil.getUpdateStatusFileName(updateModel
+      .updatedTimeStamp.toString)
+    val updatedSegments = SegmentUpdateStatusManager.readLoadMetadata(updateTableStatusFile,
+      carbonTable.getTablePath).map(_.getSegmentName).toSet
     val segmentFiles = segmentDetails.asScala.map { seg =>
-      val load =
-        metadataDetails.find(_.getLoadName.equals(seg.getSegmentNo)).get
-      val segmentFile = load.getSegmentFile
-      var segmentFiles: Seq[CarbonFile] = Seq.empty[CarbonFile]
+      // create new segment files and merge for only updated segments
+      if (updatedSegments.contains(seg.getSegmentNo)) {
+        val load =
+          metadataDetails.find(_.getLoadName.equals(seg.getSegmentNo)).get
+        val segmentFile = load.getSegmentFile
+        var segmentFiles: Seq[CarbonFile] = Seq.empty[CarbonFile]
 
-      val segmentMetaDataInfo = segmentMetaDataInfoMap.get(seg.getSegmentNo)
-      val file = SegmentFileStore.writeSegmentFile(
-        carbonTable,
-        seg.getSegmentNo,
-        String.valueOf(System.currentTimeMillis()),
-        load.getPath,
-        segmentMetaDataInfo)
+        val segmentMetaDataInfo = segmentMetaDataInfoMap.get(seg.getSegmentNo)
+        val file = SegmentFileStore.writeSegmentFile(

Review comment:
       here writing segment file again to just update the current timestamp ?
   you can add comment If possible.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#discussion_r489299596



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -725,43 +725,53 @@ object CarbonDataRDDFactory {
     val metadataDetails =
       SegmentStatusManager.readTableStatusFile(
         CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath))
+    val updateTableStatusFile = CarbonUpdateUtil.getUpdateStatusFileName(updateModel
+      .updatedTimeStamp.toString)
+    val updatedSegments = SegmentUpdateStatusManager.readLoadMetadata(updateTableStatusFile,
+      carbonTable.getTablePath).map(_.getSegmentName).toSet
     val segmentFiles = segmentDetails.asScala.map { seg =>
-      val load =
-        metadataDetails.find(_.getLoadName.equals(seg.getSegmentNo)).get
-      val segmentFile = load.getSegmentFile
-      var segmentFiles: Seq[CarbonFile] = Seq.empty[CarbonFile]
+      // create new segment files and merge for only updated segments

Review comment:
       here `seg`, `file`, `carbonFile` are not good variable names, you can check and refactor if you want




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#discussion_r489323476



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -725,43 +725,53 @@ object CarbonDataRDDFactory {
     val metadataDetails =
       SegmentStatusManager.readTableStatusFile(
         CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath))
+    val updateTableStatusFile = CarbonUpdateUtil.getUpdateStatusFileName(updateModel
+      .updatedTimeStamp.toString)
+    val updatedSegments = SegmentUpdateStatusManager.readLoadMetadata(updateTableStatusFile,
+      carbonTable.getTablePath).map(_.getSegmentName).toSet
     val segmentFiles = segmentDetails.asScala.map { seg =>
-      val load =
-        metadataDetails.find(_.getLoadName.equals(seg.getSegmentNo)).get
-      val segmentFile = load.getSegmentFile
-      var segmentFiles: Seq[CarbonFile] = Seq.empty[CarbonFile]
+      // create new segment files and merge for only updated segments

Review comment:
       changed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#discussion_r489324392



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -725,43 +725,53 @@ object CarbonDataRDDFactory {
     val metadataDetails =
       SegmentStatusManager.readTableStatusFile(
         CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath))
+    val updateTableStatusFile = CarbonUpdateUtil.getUpdateStatusFileName(updateModel
+      .updatedTimeStamp.toString)
+    val updatedSegments = SegmentUpdateStatusManager.readLoadMetadata(updateTableStatusFile,
+      carbonTable.getTablePath).map(_.getSegmentName).toSet
     val segmentFiles = segmentDetails.asScala.map { seg =>
-      val load =
-        metadataDetails.find(_.getLoadName.equals(seg.getSegmentNo)).get
-      val segmentFile = load.getSegmentFile
-      var segmentFiles: Seq[CarbonFile] = Seq.empty[CarbonFile]
+      // create new segment files and merge for only updated segments
+      if (updatedSegments.contains(seg.getSegmentNo)) {
+        val load =
+          metadataDetails.find(_.getLoadName.equals(seg.getSegmentNo)).get
+        val segmentFile = load.getSegmentFile
+        var segmentFiles: Seq[CarbonFile] = Seq.empty[CarbonFile]
 
-      val segmentMetaDataInfo = segmentMetaDataInfoMap.get(seg.getSegmentNo)
-      val file = SegmentFileStore.writeSegmentFile(
-        carbonTable,
-        seg.getSegmentNo,
-        String.valueOf(System.currentTimeMillis()),
-        load.getPath,
-        segmentMetaDataInfo)
+        val segmentMetaDataInfo = segmentMetaDataInfoMap.get(seg.getSegmentNo)
+        val file = SegmentFileStore.writeSegmentFile(

Review comment:
       here method level comment says for new updates on the same segment we need to write new segment file with new segmentMetaDataInfo and merge with old segment file , i think that comment is pretty clear. It should be ok i think. Please check once.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#issuecomment-693310888


   LGTM.
   
   will merge once the build passes


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#issuecomment-693362694


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4097/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927#issuecomment-693363425


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2356/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3927: [CARBONDATA-3989]Fix unnecessary segment files creation when segment is neither updated nor deleted.

GitBox
In reply to this post by GitBox

asfgit closed pull request #3927:
URL: https://github.com/apache/carbondata/pull/3927


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]