[GitHub] [carbondata] niuge01 opened a new pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

classic Classic list List threaded Threaded
67 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] niuge01 opened a new pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
niuge01 opened a new pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602
 
 
    ### Why is this PR needed?
    At the end of the CarbonInsertFromStageCommand, the stage files will be cleared, but the data files which referenced by stage files will be not cleared. This could lead to a large backlog of data files。
   
    ### What changes were proposed in this PR?
   Provide a new command to allows us to delete data files which referenced by disabled table stages.
   
   The new command is CarbonDeleteStageCommand.
       
    ### Does this PR introduce any user interface change?
    - Yes
   
    ### Is any new testcase added?
    - Yes
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
CarbonDataQA1 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#issuecomment-581742358
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1845/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#issuecomment-581743095
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/141/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] niuge01 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
niuge01 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#issuecomment-581765554
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
QiangCai commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#issuecomment-581767435
 
 
   does this command support deleting any folder which specified?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#issuecomment-581770379
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/143/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] niuge01 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
niuge01 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#issuecomment-581775570
 
 
   > does this command support deleting any folder which specified?
   
   Yes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#issuecomment-581787313
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1847/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r374655622
 
 

 ##########
 File path: docs/dml-of-carbondata.md
 ##########
 @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes:
   ```
   DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA')
   ```
+    
+### DELETE STAGE
 
 Review comment:
   ```suggestion
   ### DELETE STAGE files
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r374655622
 
 

 ##########
 File path: docs/dml-of-carbondata.md
 ##########
 @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes:
   ```
   DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA')
   ```
+    
+### DELETE STAGE
 
 Review comment:
   ```suggestion
   ### DELETE STAGE files
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023328
 
 

 ##########
 File path: docs/dml-of-carbondata.md
 ##########
 @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes:
   ```
   DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA')
   ```
+    
+### DELETE STAGE
+
+  This command allows us to delete data files which referenced by disabled table stages.
 
 Review comment:
   ```suggestion
     This command allows us to delete the data files (stage data) which is already loaded into the table.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023551
 
 

 ##########
 File path: docs/dml-of-carbondata.md
 ##########
 @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes:
   ```
   DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA')
   ```
+    
+### DELETE STAGE
+
+  This command allows us to delete data files which referenced by disabled table stages.
+  ```
+  DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...)
+  ```  
+  **Supported Properties:**
+
+| Property                                                     | Description                                                  |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [data_file_location](#data_file_location)                    | The data files location                                      |
 
 Review comment:
   ```suggestion
   | [location](#data_file_location)                    | The data file location                                      |
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023733
 
 

 ##########
 File path: docs/dml-of-carbondata.md
 ##########
 @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes:
   ```
   DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA')
   ```
+    
+### DELETE STAGE
+
+  This command allows us to delete data files which referenced by disabled table stages.
+  ```
+  DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...)
+  ```  
+  **Supported Properties:**
+
+| Property                                                     | Description                                                  |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [data_file_location](#data_file_location)                    | The data files location                                      |
+| [data_file_retain_time_second](#data_file_retain_time_second)| Data file retain time in second                              |
 
 Review comment:
   ```suggestion
   | [retain_hour](#retain_hour)| Data file retain time in hours                              |
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023733
 
 

 ##########
 File path: docs/dml-of-carbondata.md
 ##########
 @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes:
   ```
   DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA')
   ```
+    
+### DELETE STAGE
+
+  This command allows us to delete data files which referenced by disabled table stages.
+  ```
+  DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...)
+  ```  
+  **Supported Properties:**
+
+| Property                                                     | Description                                                  |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [data_file_location](#data_file_location)                    | The data files location                                      |
+| [data_file_retain_time_second](#data_file_retain_time_second)| Data file retain time in second                              |
 
 Review comment:
   ```suggestion
   | [retain_munite](#retain_hour)| Data file retain time in minutes                              |
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023733
 
 

 ##########
 File path: docs/dml-of-carbondata.md
 ##########
 @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes:
   ```
   DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA')
   ```
+    
+### DELETE STAGE
+
+  This command allows us to delete data files which referenced by disabled table stages.
+  ```
+  DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...)
+  ```  
+  **Supported Properties:**
+
+| Property                                                     | Description                                                  |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [data_file_location](#data_file_location)                    | The data files location                                      |
+| [data_file_retain_time_second](#data_file_retain_time_second)| Data file retain time in second                              |
 
 Review comment:
   ```suggestion
   | [retain_minute](#retain_minute)| Data file retain time in minutes                              |
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023551
 
 

 ##########
 File path: docs/dml-of-carbondata.md
 ##########
 @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes:
   ```
   DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA')
   ```
+    
+### DELETE STAGE
+
+  This command allows us to delete data files which referenced by disabled table stages.
+  ```
+  DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...)
+  ```  
+  **Supported Properties:**
+
+| Property                                                     | Description                                                  |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [data_file_location](#data_file_location)                    | The data files location                                      |
 
 Review comment:
   ```suggestion
   | [location](#location)                    | The data file location                                      |
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023733
 
 

 ##########
 File path: docs/dml-of-carbondata.md
 ##########
 @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes:
   ```
   DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA')
   ```
+    
+### DELETE STAGE
+
+  This command allows us to delete data files which referenced by disabled table stages.
+  ```
+  DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...)
+  ```  
+  **Supported Properties:**
+
+| Property                                                     | Description                                                  |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [data_file_location](#data_file_location)                    | The data files location                                      |
+| [data_file_retain_time_second](#data_file_retain_time_second)| Data file retain time in second                              |
 
 Review comment:
   ```suggestion
   | [retain_hour](#retain_hour)| Data file retain time in hours                              |
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023733
 
 

 ##########
 File path: docs/dml-of-carbondata.md
 ##########
 @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes:
   ```
   DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA')
   ```
+    
+### DELETE STAGE
+
+  This command allows us to delete data files which referenced by disabled table stages.
+  ```
+  DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...)
+  ```  
+  **Supported Properties:**
+
+| Property                                                     | Description                                                  |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [data_file_location](#data_file_location)                    | The data files location                                      |
+| [data_file_retain_time_second](#data_file_retain_time_second)| Data file retain time in second                              |
 
 Review comment:
   ```suggestion
   | [retain_hour](#retain_hour)| Data file retain time in hours                              |
   ```
   
   I think stage file should be kept at least in hours

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375025975
 
 

 ##########
 File path: docs/dml-of-carbondata.md
 ##########
 @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes:
   ```
   DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA')
   ```
+    
+### DELETE STAGE
+
+  This command allows us to delete data files which referenced by disabled table stages.
+  ```
+  DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...)
+  ```  
+  **Supported Properties:**
+
+| Property                                                     | Description                                                  |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [data_file_location](#data_file_location)                    | The data files location                                      |
+| [data_file_retain_time_second](#data_file_retain_time_second)| Data file retain time in second                              |
+
+-
+  You can use the following options to delete data:
+
+  - ##### data_file_location:
+    The data files location, the command will scan the location, and delete files which can be deleted.
 
 Review comment:
   ```suggestion
       The data files location, the command will scan the location, and delete files which are already loaded into the table. This check is done by scanning the stage metadata folder in table path.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375026220
 
 

 ##########
 File path: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ##########
 @@ -218,6 +218,12 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser {
         }
     }
 
+  protected lazy val options: Parser[(String, String)] =
 
 Review comment:
   you can rename `loadOptions` and reuse the same function

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
1234