[GitHub] [carbondata] akashrn5 opened a new pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 opened a new pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
akashrn5 opened a new pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719
 
 
    ### Why is this PR needed?
    List files was done during rebuild of SI, which will be costly in case of S3 and OBS
   
    ### What changes were proposed in this PR?
   avoided list files and delete files handled through segmentFileStore
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No(existing tests will takecare)
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
Indhumathi27 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410097884
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
 ##########
 @@ -280,6 +284,40 @@ object SecondaryIndexUtil {
     }
   }
 
+  /**
+   * This method deletes the old index files or merge index file after data files merge
+   */
+  def deleteOldIndexOrMergeIndexFiles(carbonLoadModel: CarbonLoadModel,
+      validSegments: util.List[Segment],
+      indexCarbonTable: CarbonTable): Unit = {
+    // delete the index/merge index carbonFile of old data files
+    val loadModel = SecondaryIndexUtil
 
 Review comment:
   Why LoadModel is formed again?, as we already have carbonLoadModel in methodParams

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410099985
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
 ##########
 @@ -280,6 +284,40 @@ object SecondaryIndexUtil {
     }
   }
 
+  /**
+   * This method deletes the old index files or merge index file after data files merge
+   */
+  def deleteOldIndexOrMergeIndexFiles(carbonLoadModel: CarbonLoadModel,
+      validSegments: util.List[Segment],
+      indexCarbonTable: CarbonTable): Unit = {
+    // delete the index/merge index carbonFile of old data files
+    val loadModel = SecondaryIndexUtil
+      .getCarbonLoadModel(carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable,
+        carbonLoadModel.getLoadMetadataDetails,
+        carbonLoadModel.getFactTimeStamp,
+        carbonLoadModel.getColumnCompressor)
+    var indexFiles: util.Map[String, String] = null
+    validSegments.asScala.foreach { segment =>
+      // for old store scenario
+      if (segment.getSegmentFileName == null) {
+        val segmentPath = CarbonTablePath.getSegmentPath(indexCarbonTable.getTablePath,
+          segment.getSegmentNo)
+        indexFiles = new SegmentIndexFileStore().getMergeOrIndexFilesFromSegment(segmentPath)
+      } else {
+        val segmentFileStore = new SegmentFileStore(indexCarbonTable.getTablePath,
+          segment.getSegmentFileName)
+        indexFiles = segmentFileStore.getIndexOrMergeFiles
+      }
+    }
+    val indexFilesToDelete = indexFiles.keySet().asScala.filter { indexFile =>
 
 Review comment:
   I think for each segment we have to delete old indexFiles. This delete code has to be moved inside foreach loop?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#issuecomment-615154725
 
 
   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1051/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410119240
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
 ##########
 @@ -280,6 +284,40 @@ object SecondaryIndexUtil {
     }
   }
 
+  /**
+   * This method deletes the old index files or merge index file after data files merge
+   */
+  def deleteOldIndexOrMergeIndexFiles(carbonLoadModel: CarbonLoadModel,
+      validSegments: util.List[Segment],
+      indexCarbonTable: CarbonTable): Unit = {
+    // delete the index/merge index carbonFile of old data files
+    val loadModel = SecondaryIndexUtil
 
 Review comment:
   you are right, removed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410119308
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
 ##########
 @@ -280,6 +284,40 @@ object SecondaryIndexUtil {
     }
   }
 
+  /**
+   * This method deletes the old index files or merge index file after data files merge
+   */
+  def deleteOldIndexOrMergeIndexFiles(carbonLoadModel: CarbonLoadModel,
+      validSegments: util.List[Segment],
+      indexCarbonTable: CarbonTable): Unit = {
+    // delete the index/merge index carbonFile of old data files
+    val loadModel = SecondaryIndexUtil
+      .getCarbonLoadModel(carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable,
+        carbonLoadModel.getLoadMetadataDetails,
+        carbonLoadModel.getFactTimeStamp,
+        carbonLoadModel.getColumnCompressor)
+    var indexFiles: util.Map[String, String] = null
+    validSegments.asScala.foreach { segment =>
+      // for old store scenario
+      if (segment.getSegmentFileName == null) {
+        val segmentPath = CarbonTablePath.getSegmentPath(indexCarbonTable.getTablePath,
+          segment.getSegmentNo)
+        indexFiles = new SegmentIndexFileStore().getMergeOrIndexFilesFromSegment(segmentPath)
+      } else {
+        val segmentFileStore = new SegmentFileStore(indexCarbonTable.getTablePath,
+          segment.getSegmentFileName)
+        indexFiles = segmentFileStore.getIndexOrMergeFiles
+      }
+    }
+    val indexFilesToDelete = indexFiles.keySet().asScala.filter { indexFile =>
 
 Review comment:
   yes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#issuecomment-615211151
 
 
   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1054/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#issuecomment-615216132
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2767/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410548764
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
 ##########
 @@ -280,6 +286,35 @@ object SecondaryIndexUtil {
     }
   }
 
+  /**
+   * This method deletes the old index files or merge index file after data files merge
+   */
+  def deleteOldIndexOrMergeIndexFiles(factTimeStamp: Long,
 
 Review comment:
   make it private and move factTimeStamp to next line

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410549441
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
 ##########
 @@ -280,6 +286,35 @@ object SecondaryIndexUtil {
     }
   }
 
+  /**
+   * This method deletes the old index files or merge index file after data files merge
+   */
+  def deleteOldIndexOrMergeIndexFiles(factTimeStamp: Long,
+      validSegments: util.List[Segment],
+      indexCarbonTable: CarbonTable): Unit = {
+    // delete the index/merge index carbonFile of old data files
+    validSegments.asScala.foreach { segment =>
+      var indexFiles: util.Map[String, String] = null
+      // for old store scenario
+      if (segment.getSegmentFileName == null) {
+        val segmentPath = CarbonTablePath.getSegmentPath(indexCarbonTable.getTablePath,
+          segment.getSegmentNo)
+        indexFiles = new SegmentIndexFileStore().getMergeOrIndexFilesFromSegment(segmentPath)
+      } else {
+        val segmentFileStore = new SegmentFileStore(indexCarbonTable.getTablePath,
+          segment.getSegmentFileName)
+        indexFiles = segmentFileStore.getIndexOrMergeFiles
+      }
+      val indexFilesToDelete = indexFiles.keySet().asScala.filter { indexFile =>
+        DataFileUtil.getTimeStampFromFileName(indexFile).toLong >
+        factTimeStamp
+      }
+      indexFilesToDelete.foreach { indexFile =>
 
 Review comment:
   no need to declare local variable indexFilesToDelete

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410550100
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
 ##########
 @@ -280,6 +286,35 @@ object SecondaryIndexUtil {
     }
   }
 
+  /**
+   * This method deletes the old index files or merge index file after data files merge
+   */
+  def deleteOldIndexOrMergeIndexFiles(factTimeStamp: Long,
+      validSegments: util.List[Segment],
+      indexCarbonTable: CarbonTable): Unit = {
+    // delete the index/merge index carbonFile of old data files
+    validSegments.asScala.foreach { segment =>
+      var indexFiles: util.Map[String, String] = null
+      // for old store scenario
+      if (segment.getSegmentFileName == null) {
 
 Review comment:
   move line 299 to 307 as a function in CarbonTable to return the index file list for specified segment list

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410550270
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
 ##########
 @@ -197,6 +200,9 @@ object SecondaryIndexUtil {
       }
       if (finalMergeStatus) {
         if (null != mergeStatus && mergeStatus.length != 0) {
+          deleteOldIndexOrMergeIndexFiles(carbonLoadModel.getFactTimeStamp,
 
 Review comment:
   move carbonLoadModel.getFactTimeStamp to  next line

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410655298
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
 ##########
 @@ -197,6 +200,9 @@ object SecondaryIndexUtil {
       }
       if (finalMergeStatus) {
         if (null != mergeStatus && mergeStatus.length != 0) {
+          deleteOldIndexOrMergeIndexFiles(carbonLoadModel.getFactTimeStamp,
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410655315
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
 ##########
 @@ -280,6 +286,35 @@ object SecondaryIndexUtil {
     }
   }
 
+  /**
+   * This method deletes the old index files or merge index file after data files merge
+   */
+  def deleteOldIndexOrMergeIndexFiles(factTimeStamp: Long,
+      validSegments: util.List[Segment],
+      indexCarbonTable: CarbonTable): Unit = {
+    // delete the index/merge index carbonFile of old data files
+    validSegments.asScala.foreach { segment =>
+      var indexFiles: util.Map[String, String] = null
+      // for old store scenario
+      if (segment.getSegmentFileName == null) {
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410655316
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
 ##########
 @@ -280,6 +286,35 @@ object SecondaryIndexUtil {
     }
   }
 
+  /**
+   * This method deletes the old index files or merge index file after data files merge
+   */
+  def deleteOldIndexOrMergeIndexFiles(factTimeStamp: Long,
+      validSegments: util.List[Segment],
+      indexCarbonTable: CarbonTable): Unit = {
+    // delete the index/merge index carbonFile of old data files
+    validSegments.asScala.foreach { segment =>
+      var indexFiles: util.Map[String, String] = null
+      // for old store scenario
+      if (segment.getSegmentFileName == null) {
+        val segmentPath = CarbonTablePath.getSegmentPath(indexCarbonTable.getTablePath,
+          segment.getSegmentNo)
+        indexFiles = new SegmentIndexFileStore().getMergeOrIndexFilesFromSegment(segmentPath)
+      } else {
+        val segmentFileStore = new SegmentFileStore(indexCarbonTable.getTablePath,
+          segment.getSegmentFileName)
+        indexFiles = segmentFileStore.getIndexOrMergeFiles
+      }
+      val indexFilesToDelete = indexFiles.keySet().asScala.filter { indexFile =>
+        DataFileUtil.getTimeStampFromFileName(indexFile).toLong >
+        factTimeStamp
+      }
+      indexFilesToDelete.foreach { indexFile =>
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410655318
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
 ##########
 @@ -280,6 +286,35 @@ object SecondaryIndexUtil {
     }
   }
 
+  /**
+   * This method deletes the old index files or merge index file after data files merge
+   */
+  def deleteOldIndexOrMergeIndexFiles(factTimeStamp: Long,
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#issuecomment-615734963
 
 
   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1066/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#issuecomment-615782837
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2779/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services