niuge01 opened a new pull request #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628 ### Why is this PR needed? Sometimes, user want load the lastest data to table first. ### What changes were proposed in this PR? Add "batch_file_order" option for CarbonInsertFromStagesCommand. ### Does this PR introduce any user interface change? - Yes. (One option "batch_file_order" is added for CarbonInsertFromStageCommand, document added) ### Is any new testcase added? - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
CarbonDataQA1 commented on issue #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#issuecomment-587357728 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/332/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#issuecomment-587391423 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2034/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#discussion_r381089273 ########## File path: docs/dml-of-carbondata.md ########## @@ -334,11 +335,21 @@ CarbonData DML statements are documented here,which includes: OPTIONS('batch_file_count'='5') ``` + - ##### BATCH_FILE_ORDER: + The order type of stage files in per processing, choices: ASC, DESC. + The default is ASC. Review comment: This is ascending or descending based on filename or filesize or file creation time. better to describe here in comments ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#discussion_r381089273 ########## File path: docs/dml-of-carbondata.md ########## @@ -334,11 +335,21 @@ CarbonData DML statements are documented here,which includes: OPTIONS('batch_file_count'='5') ``` + - ##### BATCH_FILE_ORDER: + The order type of stage files in per processing, choices: ASC, DESC. + The default is ASC. Review comment: This is ascending or descending based on filename or filesize or file creation time? better to describe here in comments ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#discussion_r381089961 ########## File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala ########## @@ -117,15 +113,29 @@ case class CarbonInsertFromStageCommand( // 1) read all existing stage files val batchSize = try { - Integer.valueOf(options.getOrElse("batch_file_count", Integer.MAX_VALUE.toString)) + Integer.valueOf( + options.getOrElse(CarbonInsertFromStageCommand.BATCH_FILE_COUNT_KEY, + CarbonInsertFromStageCommand.BATCH_FILE_COUNT_DEFAULT)) } catch { case _: NumberFormatException => - throw new MalformedCarbonCommandException("Option [batch_file_count] is not a number.") + throw new MalformedCarbonCommandException("Option [" + + CarbonInsertFromStageCommand.BATCH_FILE_COUNT_KEY + "] is not a number.") } if (batchSize < 1) { - throw new MalformedCarbonCommandException("Option [batch_file_count] is less than 1.") + throw new MalformedCarbonCommandException("Option [" + + CarbonInsertFromStageCommand.BATCH_FILE_COUNT_KEY + "] is less than 1.") } - val stageFiles = listStageFiles(stagePath, hadoopConf, batchSize) + val orderType = options.getOrElse(CarbonInsertFromStageCommand.BATCH_FILE_ORDER_KEY, + CarbonInsertFromStageCommand.BATCH_FILE_ORDER_DEFAULT) + if (!orderType.equalsIgnoreCase(CarbonInsertFromStageCommand.BATCH_FILE_ORDER_ASC) && + !orderType.equalsIgnoreCase(CarbonInsertFromStageCommand.BATCH_FILE_ORDER_DESC)) { + throw new MalformedCarbonCommandException("Option [" + + CarbonInsertFromStageCommand.BATCH_FILE_ORDER_KEY + "] is invalid, should be " + + CarbonInsertFromStageCommand.BATCH_FILE_ORDER_ASC + " or " + + CarbonInsertFromStageCommand.BATCH_FILE_ORDER_DESC + ".") + } + val stageFiles = listStageFiles(stagePath, hadoopConf, batchSize, + orderType.equalsIgnoreCase(CarbonInsertFromStageCommand.BATCH_FILE_ORDER_ASC)) Review comment: May be add a log for the selected `orderType` ? so that user or tester can verify this ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#discussion_r381091227 ########## File path: docs/dml-of-carbondata.md ########## @@ -334,11 +335,21 @@ CarbonData DML statements are documented here,which includes: OPTIONS('batch_file_count'='5') ``` + - ##### BATCH_FILE_ORDER: + The order type of stage files in per processing, choices: ASC, DESC. + The default is ASC. Review comment: Down I saw that, it is file last modified time. better to mention in the document and in comments ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on issue #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#issuecomment-593816601 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#discussion_r386848723 ########## File path: docs/dml-of-carbondata.md ########## @@ -334,11 +335,21 @@ CarbonData DML statements are documented here,which includes: OPTIONS('batch_file_count'='5') ``` + - ##### BATCH_FILE_ORDER: + The order type of stage files in per processing, choices: ASC, DESC. + The default is ASC. Review comment: Done, comments added. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#discussion_r386848984 ########## File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala ########## @@ -117,15 +113,29 @@ case class CarbonInsertFromStageCommand( // 1) read all existing stage files val batchSize = try { - Integer.valueOf(options.getOrElse("batch_file_count", Integer.MAX_VALUE.toString)) + Integer.valueOf( + options.getOrElse(CarbonInsertFromStageCommand.BATCH_FILE_COUNT_KEY, + CarbonInsertFromStageCommand.BATCH_FILE_COUNT_DEFAULT)) } catch { case _: NumberFormatException => - throw new MalformedCarbonCommandException("Option [batch_file_count] is not a number.") + throw new MalformedCarbonCommandException("Option [" + + CarbonInsertFromStageCommand.BATCH_FILE_COUNT_KEY + "] is not a number.") } if (batchSize < 1) { - throw new MalformedCarbonCommandException("Option [batch_file_count] is less than 1.") + throw new MalformedCarbonCommandException("Option [" + + CarbonInsertFromStageCommand.BATCH_FILE_COUNT_KEY + "] is less than 1.") } - val stageFiles = listStageFiles(stagePath, hadoopConf, batchSize) + val orderType = options.getOrElse(CarbonInsertFromStageCommand.BATCH_FILE_ORDER_KEY, + CarbonInsertFromStageCommand.BATCH_FILE_ORDER_DEFAULT) + if (!orderType.equalsIgnoreCase(CarbonInsertFromStageCommand.BATCH_FILE_ORDER_ASC) && + !orderType.equalsIgnoreCase(CarbonInsertFromStageCommand.BATCH_FILE_ORDER_DESC)) { + throw new MalformedCarbonCommandException("Option [" + + CarbonInsertFromStageCommand.BATCH_FILE_ORDER_KEY + "] is invalid, should be " + + CarbonInsertFromStageCommand.BATCH_FILE_ORDER_ASC + " or " + + CarbonInsertFromStageCommand.BATCH_FILE_ORDER_DESC + ".") + } + val stageFiles = listStageFiles(stagePath, hadoopConf, batchSize, + orderType.equalsIgnoreCase(CarbonInsertFromStageCommand.BATCH_FILE_ORDER_ASC)) Review comment: Yes, add a log will be better, done. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#issuecomment-593822692 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/574/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#issuecomment-593855085 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2279/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#discussion_r387409789 ########## File path: docs/dml-of-carbondata.md ########## @@ -334,11 +335,22 @@ CarbonData DML statements are documented here,which includes: OPTIONS('batch_file_count'='5') ``` + - ##### BATCH_FILE_ORDER: + The order type of stage files in per processing, choices: ASC, DESC. + The default is ASC. + Stage files will order by the last modified time with the specified order type. + + ``` + OPTIONS('batch_file_order'='DESC') + ``` + Examples: ``` INSERT INTO table1 STAGE INSERT INTO table1 STAGE OPTIONS('batch_file_count' = '5') Review comment: Add comment to explain this command will insert the earliest stage files into the table ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#discussion_r387409824 ########## File path: docs/dml-of-carbondata.md ########## @@ -334,11 +335,22 @@ CarbonData DML statements are documented here,which includes: OPTIONS('batch_file_count'='5') ``` + - ##### BATCH_FILE_ORDER: + The order type of stage files in per processing, choices: ASC, DESC. + The default is ASC. + Stage files will order by the last modified time with the specified order type. + + ``` + OPTIONS('batch_file_order'='DESC') + ``` + Examples: ``` INSERT INTO table1 STAGE INSERT INTO table1 STAGE OPTIONS('batch_file_count' = '5') + + INSERT INTO table1 STAGE OPTIONS('batch_file_count' = '5', 'batch_file_order'='DESC') Review comment: Add comment to explain this command will insert the latest stage files into the table ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#discussion_r387410408 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala ########## @@ -551,3 +569,21 @@ case class CarbonInsertFromStageCommand( override protected def opName: String = "INSERT STAGE" } + +object CarbonInsertFromStageCommand { + + val DELETE_FILES_RETRY_TIMES = 3 + + val BATCH_FILE_COUNT_KEY = "batch_file_count" + + val BATCH_FILE_COUNT_DEFAULT: String = Integer.MAX_VALUE.toString + + val BATCH_FILE_ORDER_KEY = "batch_file_order" + + val BATCH_FILE_ORDER_ASC = "ASC" + + val BATCH_FILE_ORDER_DESC = "DESC" Review comment: Add comment to explain that use this option will insert the latest stage files into the table ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on issue #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#issuecomment-594412249 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#discussion_r387540380 ########## File path: docs/dml-of-carbondata.md ########## @@ -334,11 +335,22 @@ CarbonData DML statements are documented here,which includes: OPTIONS('batch_file_count'='5') ``` + - ##### BATCH_FILE_ORDER: + The order type of stage files in per processing, choices: ASC, DESC. + The default is ASC. + Stage files will order by the last modified time with the specified order type. + + ``` + OPTIONS('batch_file_order'='DESC') + ``` + Examples: ``` INSERT INTO table1 STAGE INSERT INTO table1 STAGE OPTIONS('batch_file_count' = '5') + + INSERT INTO table1 STAGE OPTIONS('batch_file_count' = '5', 'batch_file_order'='DESC') Review comment: added ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#discussion_r387540583 ########## File path: docs/dml-of-carbondata.md ########## @@ -334,11 +335,22 @@ CarbonData DML statements are documented here,which includes: OPTIONS('batch_file_count'='5') ``` + - ##### BATCH_FILE_ORDER: + The order type of stage files in per processing, choices: ASC, DESC. + The default is ASC. + Stage files will order by the last modified time with the specified order type. + + ``` + OPTIONS('batch_file_order'='DESC') + ``` + Examples: ``` INSERT INTO table1 STAGE INSERT INTO table1 STAGE OPTIONS('batch_file_count' = '5') Review comment: added ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#discussion_r387540764 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala ########## @@ -551,3 +569,21 @@ case class CarbonInsertFromStageCommand( override protected def opName: String = "INSERT STAGE" } + +object CarbonInsertFromStageCommand { + + val DELETE_FILES_RETRY_TIMES = 3 + + val BATCH_FILE_COUNT_KEY = "batch_file_count" + + val BATCH_FILE_COUNT_DEFAULT: String = Integer.MAX_VALUE.toString + + val BATCH_FILE_ORDER_KEY = "batch_file_order" + + val BATCH_FILE_ORDER_ASC = "ASC" + + val BATCH_FILE_ORDER_DESC = "DESC" Review comment: added ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3628: [CARBONDATA-3714] Support specify order type when list stage files
URL: https://github.com/apache/carbondata/pull/3628#issuecomment-594455639 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2312/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |