akashrn5 opened a new pull request #3736: URL: https://github.com/apache/carbondata/pull/3736 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA1 commented on pull request #3736: URL: https://github.com/apache/carbondata/pull/3736#issuecomment-623174960 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2915/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3736: URL: https://github.com/apache/carbondata/pull/3736#issuecomment-623175013 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1197/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3736: URL: https://github.com/apache/carbondata/pull/3736#issuecomment-623348862 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2921/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3736: URL: https://github.com/apache/carbondata/pull/3736#issuecomment-623355483 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1203/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
chetandb commented on a change in pull request #3736: URL: https://github.com/apache/carbondata/pull/3736#discussion_r419334449 ########## File path: docs/dml-of-carbondata.md ########## @@ -219,61 +218,57 @@ CarbonData DML statements are documented here,which includes: OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', 'BAD_RECORDS_ACTION'='REDIRECT', 'IS_EMPTY_DATA_BAD_RECORD'='false') ``` - **NOTE:** - * BAD_RECORDS_ACTION property can have four type of actions for bad records FORCE, REDIRECT, IGNORE and FAIL. - * FAIL option is its Default value. If the FAIL option is used, then data loading fails if any bad records are found. - * If the REDIRECT option is used, CarbonData will add all bad records in to a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You are advised to cleanse the original source record for further data ingestion. This option is used to remind you which records are bad records. - * If the FORCE option is used, then it auto-converts the data by storing the bad records as NULL before Loading data. - * If the IGNORE option is used, then bad records are neither loaded nor written to the separate CSV file. - * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails. - * The default maximum number of characters per column is 32000. If there are more than 32000 characters in a column, please refer to *String longer than 32000 characters* section. - * Since Bad Records Path can be specified in create, load and carbon properties. - Therefore, value specified in load will have the highest priority, and value specified in carbon properties will have the least priority. + **NOTE:** + * BAD_RECORDS_ACTION property can have four types of actions for bad records FORCE, REDIRECT, IGNORE, and FAIL. + * FAIL option is its Default value. If the FAIL option is used, then data loading fails if any bad records are found. + * If the REDIRECT option is used, CarbonData will add all bad records into a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You are advised to cleanse the source record for further data ingestion. This option is used to remind you which records are bad. + * If the FORCE option is used, then it auto-converts the data by storing the bad records as NULL before Loading data. + * If the IGNORE option is used, then bad records are neither loaded nor written to the separate CSV file. + * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails. + * The default maximum number of characters per column is 32000. If there are more than 32000 characters in a column, please refer to *String longer than 32000 characters* section. + * Since Bad Records Path can be specified in create, load and carbon properties. + Therefore, the value specified in load will have the highest priority, and value specified in carbon properties will have the least priority. - Example: + Example: - ``` - LOAD DATA INPATH 'filepath.csv' INTO TABLE tablename - OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true','BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', - 'BAD_RECORDS_ACTION'='REDIRECT','IS_EMPTY_DATA_BAD_RECORD'='false') - ``` + ``` + LOAD DATA INPATH 'filepath.csv' INTO TABLE tablename + OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true','BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', + 'BAD_RECORDS_ACTION'='REDIRECT','IS_EMPTY_DATA_BAD_RECORD'='false') + ``` - ##### GLOBAL_SORT_PARTITIONS: - If the SORT_SCOPE is defined as GLOBAL_SORT, then user can specify the number of partitions to use while shuffling data for sort using GLOBAL_SORT_PARTITIONS. If it is not configured, or configured less than 1, then it uses the number of map task as reduce task. It is recommended that each reduce task deal with 512MB-1GB data. + If the SORT_SCOPE is defined as GLOBAL_SORT, then the user can specify the number of partitions to use while shuffling data for sort using GLOBAL_SORT_PARTITIONS. If it is not configured, or configured less than 1, then it uses the number of map tasks as reduce tasks. It is recommended that each reduce task to deal with 512MB-1GB data. Review comment: "It is recommended that each reduce task to deal with 512MB-1GB data." - This can be modified to "It is recommended that each reduce task deals with 512MB-1GB data." ########## File path: docs/dml-of-carbondata.md ########## @@ -316,12 +311,12 @@ CarbonData DML statements are documented here,which includes: INSERT OVERWRITE TABLE table1 SELECT * FROM TABLE2 ``` -### INSERT DATA INTO CARBONDATA TABLE From Stage Input Files +## INSERT DATA INTO CARBONDATA TABLE From Stage Input Files Stage input files are data files written by external application (such as Flink). These files are committed but not loaded into the table. - You can use this command to insert them into the table, so that making them visible for query. + User can use this command to insert them into the table, so that making them visible for a query. Review comment: "User can use this command to insert them into the table, so that making them visible for a query." can be changed to "User can use this command to insert them into the table, thus making them visible for a query." ########## File path: docs/dml-of-carbondata.md ########## @@ -352,18 +347,18 @@ CarbonData DML statements are documented here,which includes: OPTIONS('batch_file_order'='DESC') ``` - Examples: - ``` - INSERT INTO table1 STAGE - - INSERT INTO table1 STAGE OPTIONS('batch_file_count' = '5') - Note: This command use the default file order, will insert the earliest stage files into the table. - - INSERT INTO table1 STAGE OPTIONS('batch_file_count' = '5', 'batch_file_order'='DESC') - Note: This command will insert the latest stage files into the table. - ``` + Examples: + ``` + INSERT INTO table1 STAGE + + INSERT INTO table1 STAGE OPTIONS('batch_file_count' = '5') + Note: This command use the default file order, will insert the earliest stage files into the table. Review comment: Change "This command use" to "This command uses" ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3736: URL: https://github.com/apache/carbondata/pull/3736#discussion_r419451876 ########## File path: docs/dml-of-carbondata.md ########## @@ -316,12 +311,12 @@ CarbonData DML statements are documented here,which includes: INSERT OVERWRITE TABLE table1 SELECT * FROM TABLE2 ``` -### INSERT DATA INTO CARBONDATA TABLE From Stage Input Files +## INSERT DATA INTO CARBONDATA TABLE From Stage Input Files Stage input files are data files written by external application (such as Flink). These files are committed but not loaded into the table. - You can use this command to insert them into the table, so that making them visible for query. + User can use this command to insert them into the table, so that making them visible for a query. Review comment: done ########## File path: docs/dml-of-carbondata.md ########## @@ -219,61 +218,57 @@ CarbonData DML statements are documented here,which includes: OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', 'BAD_RECORDS_ACTION'='REDIRECT', 'IS_EMPTY_DATA_BAD_RECORD'='false') ``` - **NOTE:** - * BAD_RECORDS_ACTION property can have four type of actions for bad records FORCE, REDIRECT, IGNORE and FAIL. - * FAIL option is its Default value. If the FAIL option is used, then data loading fails if any bad records are found. - * If the REDIRECT option is used, CarbonData will add all bad records in to a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You are advised to cleanse the original source record for further data ingestion. This option is used to remind you which records are bad records. - * If the FORCE option is used, then it auto-converts the data by storing the bad records as NULL before Loading data. - * If the IGNORE option is used, then bad records are neither loaded nor written to the separate CSV file. - * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails. - * The default maximum number of characters per column is 32000. If there are more than 32000 characters in a column, please refer to *String longer than 32000 characters* section. - * Since Bad Records Path can be specified in create, load and carbon properties. - Therefore, value specified in load will have the highest priority, and value specified in carbon properties will have the least priority. + **NOTE:** + * BAD_RECORDS_ACTION property can have four types of actions for bad records FORCE, REDIRECT, IGNORE, and FAIL. + * FAIL option is its Default value. If the FAIL option is used, then data loading fails if any bad records are found. + * If the REDIRECT option is used, CarbonData will add all bad records into a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You are advised to cleanse the source record for further data ingestion. This option is used to remind you which records are bad. + * If the FORCE option is used, then it auto-converts the data by storing the bad records as NULL before Loading data. + * If the IGNORE option is used, then bad records are neither loaded nor written to the separate CSV file. + * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails. + * The default maximum number of characters per column is 32000. If there are more than 32000 characters in a column, please refer to *String longer than 32000 characters* section. + * Since Bad Records Path can be specified in create, load and carbon properties. + Therefore, the value specified in load will have the highest priority, and value specified in carbon properties will have the least priority. - Example: + Example: - ``` - LOAD DATA INPATH 'filepath.csv' INTO TABLE tablename - OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true','BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', - 'BAD_RECORDS_ACTION'='REDIRECT','IS_EMPTY_DATA_BAD_RECORD'='false') - ``` + ``` + LOAD DATA INPATH 'filepath.csv' INTO TABLE tablename + OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true','BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', + 'BAD_RECORDS_ACTION'='REDIRECT','IS_EMPTY_DATA_BAD_RECORD'='false') + ``` - ##### GLOBAL_SORT_PARTITIONS: - If the SORT_SCOPE is defined as GLOBAL_SORT, then user can specify the number of partitions to use while shuffling data for sort using GLOBAL_SORT_PARTITIONS. If it is not configured, or configured less than 1, then it uses the number of map task as reduce task. It is recommended that each reduce task deal with 512MB-1GB data. + If the SORT_SCOPE is defined as GLOBAL_SORT, then the user can specify the number of partitions to use while shuffling data for sort using GLOBAL_SORT_PARTITIONS. If it is not configured, or configured less than 1, then it uses the number of map tasks as reduce tasks. It is recommended that each reduce task to deal with 512MB-1GB data. Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3736: URL: https://github.com/apache/carbondata/pull/3736#issuecomment-623555888 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3736: URL: https://github.com/apache/carbondata/pull/3736#issuecomment-623926592 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1218/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3736: URL: https://github.com/apache/carbondata/pull/3736#issuecomment-623933556 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2936/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kunal642 commented on a change in pull request #3736: URL: https://github.com/apache/carbondata/pull/3736#discussion_r420117381 ########## File path: docs/dml-of-carbondata.md ########## @@ -219,61 +218,57 @@ CarbonData DML statements are documented here,which includes: OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', 'BAD_RECORDS_ACTION'='REDIRECT', 'IS_EMPTY_DATA_BAD_RECORD'='false') ``` - **NOTE:** - * BAD_RECORDS_ACTION property can have four type of actions for bad records FORCE, REDIRECT, IGNORE and FAIL. - * FAIL option is its Default value. If the FAIL option is used, then data loading fails if any bad records are found. - * If the REDIRECT option is used, CarbonData will add all bad records in to a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You are advised to cleanse the original source record for further data ingestion. This option is used to remind you which records are bad records. - * If the FORCE option is used, then it auto-converts the data by storing the bad records as NULL before Loading data. - * If the IGNORE option is used, then bad records are neither loaded nor written to the separate CSV file. - * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails. - * The default maximum number of characters per column is 32000. If there are more than 32000 characters in a column, please refer to *String longer than 32000 characters* section. - * Since Bad Records Path can be specified in create, load and carbon properties. - Therefore, value specified in load will have the highest priority, and value specified in carbon properties will have the least priority. + **NOTE:** + * BAD_RECORDS_ACTION property can have four types of actions for bad records FORCE, REDIRECT, IGNORE, and FAIL. Review comment: There is a lot of indentation in the start of the lines...If they dont cause any change to the actual doc then better to remove ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3736: URL: https://github.com/apache/carbondata/pull/3736#discussion_r420279748 ########## File path: docs/dml-of-carbondata.md ########## @@ -219,61 +218,57 @@ CarbonData DML statements are documented here,which includes: OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', 'BAD_RECORDS_ACTION'='REDIRECT', 'IS_EMPTY_DATA_BAD_RECORD'='false') ``` - **NOTE:** - * BAD_RECORDS_ACTION property can have four type of actions for bad records FORCE, REDIRECT, IGNORE and FAIL. - * FAIL option is its Default value. If the FAIL option is used, then data loading fails if any bad records are found. - * If the REDIRECT option is used, CarbonData will add all bad records in to a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You are advised to cleanse the original source record for further data ingestion. This option is used to remind you which records are bad records. - * If the FORCE option is used, then it auto-converts the data by storing the bad records as NULL before Loading data. - * If the IGNORE option is used, then bad records are neither loaded nor written to the separate CSV file. - * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails. - * The default maximum number of characters per column is 32000. If there are more than 32000 characters in a column, please refer to *String longer than 32000 characters* section. - * Since Bad Records Path can be specified in create, load and carbon properties. - Therefore, value specified in load will have the highest priority, and value specified in carbon properties will have the least priority. + **NOTE:** + * BAD_RECORDS_ACTION property can have four types of actions for bad records FORCE, REDIRECT, IGNORE, and FAIL. Review comment: indentation is needed, this was beacause main bullet points were not proper, if you see the old document, these are required ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kunal642 commented on pull request #3736: URL: https://github.com/apache/carbondata/pull/3736#issuecomment-624500865 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |