niuge01 opened a new pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602 ### Why is this PR needed? At the end of the CarbonInsertFromStageCommand, the stage files will be cleared, but the data files which referenced by stage files will be not cleared. This could lead to a large backlog of data files。 ### What changes were proposed in this PR? Provide a new command to allows us to delete data files which referenced by disabled table stages. The new command is CarbonDeleteStageCommand. ### Does this PR introduce any user interface change? - Yes ### Is any new testcase added? - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
CarbonDataQA1 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#issuecomment-581742358 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1845/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#issuecomment-581743095 Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/141/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#issuecomment-581765554 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
QiangCai commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#issuecomment-581767435 does this command support deleting any folder which specified? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#issuecomment-581770379 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/143/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#issuecomment-581775570 > does this command support deleting any folder which specified? Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#issuecomment-581787313 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1847/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r374655622 ########## File path: docs/dml-of-carbondata.md ########## @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes: ``` DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA') ``` + +### DELETE STAGE Review comment: ```suggestion ### DELETE STAGE files ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r374655622 ########## File path: docs/dml-of-carbondata.md ########## @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes: ``` DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA') ``` + +### DELETE STAGE Review comment: ```suggestion ### DELETE STAGE files ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023328 ########## File path: docs/dml-of-carbondata.md ########## @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes: ``` DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA') ``` + +### DELETE STAGE + + This command allows us to delete data files which referenced by disabled table stages. Review comment: ```suggestion This command allows us to delete the data files (stage data) which is already loaded into the table. ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023551 ########## File path: docs/dml-of-carbondata.md ########## @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes: ``` DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA') ``` + +### DELETE STAGE + + This command allows us to delete data files which referenced by disabled table stages. + ``` + DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...) + ``` + **Supported Properties:** + +| Property | Description | +| ------------------------------------------------------------ | ------------------------------------------------------------ | +| [data_file_location](#data_file_location) | The data files location | Review comment: ```suggestion | [location](#data_file_location) | The data file location | ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023733 ########## File path: docs/dml-of-carbondata.md ########## @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes: ``` DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA') ``` + +### DELETE STAGE + + This command allows us to delete data files which referenced by disabled table stages. + ``` + DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...) + ``` + **Supported Properties:** + +| Property | Description | +| ------------------------------------------------------------ | ------------------------------------------------------------ | +| [data_file_location](#data_file_location) | The data files location | +| [data_file_retain_time_second](#data_file_retain_time_second)| Data file retain time in second | Review comment: ```suggestion | [retain_hour](#retain_hour)| Data file retain time in hours | ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023733 ########## File path: docs/dml-of-carbondata.md ########## @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes: ``` DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA') ``` + +### DELETE STAGE + + This command allows us to delete data files which referenced by disabled table stages. + ``` + DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...) + ``` + **Supported Properties:** + +| Property | Description | +| ------------------------------------------------------------ | ------------------------------------------------------------ | +| [data_file_location](#data_file_location) | The data files location | +| [data_file_retain_time_second](#data_file_retain_time_second)| Data file retain time in second | Review comment: ```suggestion | [retain_munite](#retain_hour)| Data file retain time in minutes | ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023733 ########## File path: docs/dml-of-carbondata.md ########## @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes: ``` DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA') ``` + +### DELETE STAGE + + This command allows us to delete data files which referenced by disabled table stages. + ``` + DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...) + ``` + **Supported Properties:** + +| Property | Description | +| ------------------------------------------------------------ | ------------------------------------------------------------ | +| [data_file_location](#data_file_location) | The data files location | +| [data_file_retain_time_second](#data_file_retain_time_second)| Data file retain time in second | Review comment: ```suggestion | [retain_minute](#retain_minute)| Data file retain time in minutes | ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023551 ########## File path: docs/dml-of-carbondata.md ########## @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes: ``` DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA') ``` + +### DELETE STAGE + + This command allows us to delete data files which referenced by disabled table stages. + ``` + DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...) + ``` + **Supported Properties:** + +| Property | Description | +| ------------------------------------------------------------ | ------------------------------------------------------------ | +| [data_file_location](#data_file_location) | The data files location | Review comment: ```suggestion | [location](#location) | The data file location | ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023733 ########## File path: docs/dml-of-carbondata.md ########## @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes: ``` DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA') ``` + +### DELETE STAGE + + This command allows us to delete data files which referenced by disabled table stages. + ``` + DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...) + ``` + **Supported Properties:** + +| Property | Description | +| ------------------------------------------------------------ | ------------------------------------------------------------ | +| [data_file_location](#data_file_location) | The data files location | +| [data_file_retain_time_second](#data_file_retain_time_second)| Data file retain time in second | Review comment: ```suggestion | [retain_hour](#retain_hour)| Data file retain time in hours | ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375023733 ########## File path: docs/dml-of-carbondata.md ########## @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes: ``` DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA') ``` + +### DELETE STAGE + + This command allows us to delete data files which referenced by disabled table stages. + ``` + DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...) + ``` + **Supported Properties:** + +| Property | Description | +| ------------------------------------------------------------ | ------------------------------------------------------------ | +| [data_file_location](#data_file_location) | The data files location | +| [data_file_retain_time_second](#data_file_retain_time_second)| Data file retain time in second | Review comment: ```suggestion | [retain_hour](#retain_hour)| Data file retain time in hours | ``` I think stage file should be kept at least in hours ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375025975 ########## File path: docs/dml-of-carbondata.md ########## @@ -446,6 +446,46 @@ CarbonData DML statements are documented here,which includes: ``` DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA') ``` + +### DELETE STAGE + + This command allows us to delete data files which referenced by disabled table stages. + ``` + DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...) + ``` + **Supported Properties:** + +| Property | Description | +| ------------------------------------------------------------ | ------------------------------------------------------------ | +| [data_file_location](#data_file_location) | The data files location | +| [data_file_retain_time_second](#data_file_retain_time_second)| Data file retain time in second | + +- + You can use the following options to delete data: + + - ##### data_file_location: + The data files location, the command will scan the location, and delete files which can be deleted. Review comment: ```suggestion The data files location, the command will scan the location, and delete files which are already loaded into the table. This check is done by scanning the stage metadata folder in table path. ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3602: [CARBONDATA-3676] Support clean carbon data files of stages.
URL: https://github.com/apache/carbondata/pull/3602#discussion_r375026220 ########## File path: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ########## @@ -218,6 +218,12 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { } } + protected lazy val options: Parser[(String, String)] = Review comment: you can rename `loadOptions` and reuse the same function ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |