[GitHub] [carbondata] Karan980 opened a new pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 opened a new pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox

Karan980 opened a new pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048


    ### Why is this PR needed?
   Clean files command is not deleting .segment files present at tablePath/Metadata/Segments for the compacted/marked for delete segments added through alter table add segment query.
   
    ### What changes were proposed in this PR?
   Added code for deleting the segment file for the segments added through alter table add segment query.
   
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox

CarbonDataQA2 commented on pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#issuecomment-740595996


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5113/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#issuecomment-740596559


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3351/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#issuecomment-740599188


   rebase with latest code and add your test cases to TestCleanFilesCommand class


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#issuecomment-740630288


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3354/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#issuecomment-740630899


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5116/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

Karan980 commented on pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#issuecomment-741542607


   > rebase with latest code and add your test cases to TestCleanFilesCommand class
   
   Code rebase done. I'm using some util function calls inside test cases which are present in current class. So, its better to keep test cases here. Also the issue in coming in segments added through alter table add segment query and all the test cases of this feature are present inside this class.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#issuecomment-741584943


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5121/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#issuecomment-741585579


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3359/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#discussion_r539979370



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -1125,19 +1125,21 @@ public static void deleteSegment(String tablePath, Segment segment,
     SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName());
     List<String> indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true,
         FileFactory.getConfiguration());
-    Map<String, List<String>> indexFilesMap = fileStore.getIndexFilesMap();

Review comment:
       If the segment is an external segment, then in that case in the segment file of the segment isRelative is false, i think it's better to check and skip it this way. What do you suggest @QiangCai ? @ajantha-bhat




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#discussion_r539979831



##########
File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##########
@@ -184,11 +184,7 @@ private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad,
 
   private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails oneLoad,

Review comment:
       this function is calling another function directly, you can remove this function




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#discussion_r539982572



##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java
##########
@@ -597,6 +599,27 @@ public static String getSegmentIdFromPath(String dataFileAbsolutePath) {
     }
   }
 
+  /**

Review comment:
       You can use getSegmentNumberFromSegmentFile method, instead of this. Please delete this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#discussion_r539983758



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##########
@@ -962,6 +962,88 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll {
     sql(s"drop table $tableName")
   }
 

Review comment:
       I don't understand why these test cases cannot be added in the testCleanFilesCommand cases. You can any necessary function that you want to use there. Also please merge these 2 test cases into 1. And also Include Inprogress segment also.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#discussion_r539984002



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##########
@@ -962,6 +962,88 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll {
     sql(s"drop table $tableName")
   }
 

Review comment:
       And also add similar test case with partition table too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#discussion_r539984002



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##########
@@ -962,6 +962,88 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll {
     sql(s"drop table $tableName")
   }
 

Review comment:
       And also add similar test case with partition table too, also try to have multiple formats instead of just 'carbon'




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#discussion_r539985886



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##########
@@ -962,6 +962,88 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll {
     sql(s"drop table $tableName")
   }
 
+  test("Test clean files on segments after compaction") {
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.CARBON_CLEAN_FILES_FORCE_ALLOWED, "true")
+    createCarbonTable()
+    sql(
+      s"""LOAD DATA local inpath '$resourcesPath/data.csv'
+         | INTO TABLE addsegment1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"')""".stripMargin)
+    val table = CarbonEnv.getCarbonTable(None, "addsegment1") (sqlContext.sparkSession)
+    val path = CarbonTablePath.getSegmentPath(table.getTablePath, "1")
+    val newPath = storeLocation + "/" + "addsegtest"
+    for (i <- 0 until 6) {

Review comment:
       instead of doing this, you can create a new table and add segments from this table




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on a change in pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

Karan980 commented on a change in pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#discussion_r540867380



##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java
##########
@@ -597,6 +599,27 @@ public static String getSegmentIdFromPath(String dataFileAbsolutePath) {
     }
   }
 
+  /**

Review comment:
       getSegmentNumberFromSegmentFile method is returning segmentId from segment file name. But i need segmentId from segmentFile absolute path. I have changed my method to make call to this method internally.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on a change in pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

Karan980 commented on a change in pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#discussion_r542119343



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -1125,19 +1125,21 @@ public static void deleteSegment(String tablePath, Segment segment,
     SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName());
     List<String> indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true,
         FileFactory.getConfiguration());
-    Map<String, List<String>> indexFilesMap = fileStore.getIndexFilesMap();

Review comment:
       I think this check is fine. Because for setting the isRelative variable same logic is used in FolderDetails class. If i use isRelative variable here, i have to generate the keys for locationMap because this variable is present inside loactionMap.

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##########
@@ -962,6 +962,88 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll {
     sql(s"drop table $tableName")
   }
 

Review comment:
       TestCases moved to testCleanFilesCommand and also added testCases for partition table and mixed formats.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on a change in pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

Karan980 commented on a change in pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#discussion_r542153838



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##########
@@ -962,6 +962,88 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll {
     sql(s"drop table $tableName")
   }
 
+  test("Test clean files on segments after compaction") {
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.CARBON_CLEAN_FILES_FORCE_ALLOWED, "true")
+    createCarbonTable()
+    sql(
+      s"""LOAD DATA local inpath '$resourcesPath/data.csv'
+         | INTO TABLE addsegment1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"')""".stripMargin)
+    val table = CarbonEnv.getCarbonTable(None, "addsegment1") (sqlContext.sparkSession)
+    val path = CarbonTablePath.getSegmentPath(table.getTablePath, "1")
+    val newPath = storeLocation + "/" + "addsegtest"
+    for (i <- 0 until 6) {

Review comment:
       I don't understand the problem with this approach.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment query.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#issuecomment-744271698


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5155/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


123