maheshrajus commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420035360 ########## File path: docs/configuration-parameters.md ########## @@ -206,21 +206,18 @@ RESET | Properties | Description | | ----------------------------------------- | ------------------------------------------------------------ | -| carbon.options.bad.records.logger.enable | CarbonData can identify the records that are not conformant to schema and isolate them as bad records. Enabling this configuration will make CarbonData to log such bad records.**NOTE:** If the input data contains many bad records, logging them will slow down the over all data loading throughput. The data load operation status would depend on the configuration in ***carbon.bad.records.action***. | -| carbon.options.bad.records.logger.enable | To enable or disable bad record logger. | -| carbon.options.bad.records.action | This property can have four types of actions for bad records FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then it auto-corrects the data by storing the bad records as NULL. If set to REDIRECT then bad records are written to the raw CSV instead of being loaded. If set to IGNORE then bad records are neither loaded nor written to the raw CSV. If set to FAIL then data loading fails if any bad records are found. | +| carbon.options.bad.records.logger.enable | To enable or disable a bad record logger. CarbonData can identify the records that are not conformant to schema and isolate them as bad records. Enabling this configuration will make CarbonData to log such bad records.**NOTE:** If the input data contains many bad records, logging them will slow down the overall data loading throughput. The data load operation status would depend on the configuration in ***carbon.bad.records.action***. | | +| carbon.options.bad.records.action | This property has four types of bad record actions: FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then it auto-corrects the data by storing the bad records as NULL. If set to REDIRECT then bad records are written to the raw CSV instead of being loaded. If set to IGNORE then bad records are neither loaded nor written to the raw CSV. If set to FAIL then data loading fails if any bad records are found. | | carbon.options.is.empty.data.bad.record | If false, then empty ("" or '' or ,,) data will not be considered as bad record and vice versa. | -| carbon.options.batch.sort.size.inmb | Size of batch data to keep in memory, as a thumb rule it supposed to be less than 45% of sort.inmemory.size.inmb otherwise it may spill intermediate data to disk. | | carbon.options.bad.record.path | Specifies the HDFS path where bad records needs to be stored. | | carbon.custom.block.distribution | Specifies whether to use the Spark or Carbon block distribution feature.**NOTE: **Refer to [Query Configuration](#query-configuration)#carbon.custom.block.distribution for more details on CarbonData scheduler. | Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#issuecomment-624068042 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2941/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#issuecomment-624068284 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1223/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420148696 ########## File path: docs/configuration-parameters.md ########## @@ -34,7 +34,7 @@ This section provides the details of all the configurations required for the Car | carbon.storelocation | spark.sql.warehouse.dir property value | Location where CarbonData will create the store, and write the data in its custom format. If not specified,the path defaults to spark.sql.warehouse.dir property. **NOTE:** Store location should be in HDFS or S3. | Review comment: This property is deprecated in 2.0, user should use only spark.sql.warehouse.dir. a) I think we should remove it b) But where do we mention about the work around for upgrade from 1.6 to 2.0 @kunal642 , @QiangCai : please check and give your suggestions for both these points ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420188382 ########## File path: docs/quick-start-guide.md ########## @@ -348,10 +341,10 @@ $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path> | Parameter | Description | Example | | ------------------- | ------------------------------------------------------------ | ---------------------------------------------------------- | -| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | carbondata_2.xx-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar | -| carbon_store_path | This is a parameter to the CarbonThriftServer class. This a HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. If not specified then it takes spark.sql.warehouse.dir path. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store` | +| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | apache-carbondata-xx.jar | +| carbon_store_path | This is a parameter to the CarbonThriftServer class. This HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. If not specified then it takes spark.sql.warehouse.dir path. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store` | Review comment: Have you tested this ? CarbonThriftServer class doesn't take this argument anymore. Also it takes S3a parameters argument. Please check the class and update the same ########## File path: docs/quick-start-guide.md ########## @@ -348,10 +341,10 @@ $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path> | Parameter | Description | Example | | ------------------- | ------------------------------------------------------------ | ---------------------------------------------------------- | -| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | carbondata_2.xx-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar | -| carbon_store_path | This is a parameter to the CarbonThriftServer class. This a HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. If not specified then it takes spark.sql.warehouse.dir path. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store` | +| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | apache-carbondata-xx.jar | +| carbon_store_path | This is a parameter to the CarbonThriftServer class. This HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. If not specified then it takes spark.sql.warehouse.dir path. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store` | Review comment: Have you tested this ? CarbonThriftServer class doesn't take this argument anymore. Also it takes S3a parameters argument. Please check the class and update the same ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420190554 ########## File path: docs/quick-start-guide.md ########## @@ -369,7 +362,7 @@ $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path> ``` ./bin/spark-submit \ --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer \ -$SPARK_HOME/carbonlib/carbondata_2.xx-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar \ +$SPARK_HOME/carbonlib/apache-carbondata-xxx.jar \ hdfs://<host_name>:port/user/hive/warehouse/carbon.store Review comment: This command will not work as thrift server doesn't take store location as argument , please check the class and update the same. suggest to test also. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420191834 ########## File path: docs/quick-start-guide.md ########## @@ -348,10 +341,10 @@ $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path> | Parameter | Description | Example | | ------------------- | ------------------------------------------------------------ | ---------------------------------------------------------- | -| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | carbondata_2.xx-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar | -| carbon_store_path | This is a parameter to the CarbonThriftServer class. This a HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. If not specified then it takes spark.sql.warehouse.dir path. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store` | +| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | apache-carbondata-xx.jar | +| carbon_store_path | This is a parameter to the CarbonThriftServer class. This HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. If not specified then it takes spark.sql.warehouse.dir path. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store` | Review comment: same comment for line 353, look for carbon_store_path ########## File path: docs/quick-start-guide.md ########## @@ -348,10 +341,10 @@ $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path> | Parameter | Description | Example | | ------------------- | ------------------------------------------------------------ | ---------------------------------------------------------- | -| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | carbondata_2.xx-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar | -| carbon_store_path | This is a parameter to the CarbonThriftServer class. This a HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. If not specified then it takes spark.sql.warehouse.dir path. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store` | +| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | apache-carbondata-xx.jar | +| carbon_store_path | This is a parameter to the CarbonThriftServer class. This HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. If not specified then it takes spark.sql.warehouse.dir path. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store` | Review comment: same comment for line 353, look for `carbon_store_path` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420148696 ########## File path: docs/configuration-parameters.md ########## @@ -34,7 +34,7 @@ This section provides the details of all the configurations required for the Car | carbon.storelocation | spark.sql.warehouse.dir property value | Location where CarbonData will create the store, and write the data in its custom format. If not specified,the path defaults to spark.sql.warehouse.dir property. **NOTE:** Store location should be in HDFS or S3. | Review comment: This property is deprecated in 2.0, user should use only spark.sql.warehouse.dir. a) I think we should remove it here in this configuration list b) But where do we mention about the work around for upgrade from 1.6 to 2.0 @kunal642 , @QiangCai : please check and give your suggestions for both these points ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
maheshrajus commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420571145 ########## File path: docs/configuration-parameters.md ########## @@ -34,7 +34,7 @@ This section provides the details of all the configurations required for the Car | carbon.storelocation | spark.sql.warehouse.dir property value | Location where CarbonData will create the store, and write the data in its custom format. If not specified,the path defaults to spark.sql.warehouse.dir property. **NOTE:** Store location should be in HDFS or S3. | Review comment: I have not removed this property. I added in NOTE section about deprecated in carbondata 2.0 version. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#issuecomment-624498173 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2945/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#issuecomment-624507780 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1227/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
maheshrajus commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420664902 ########## File path: docs/quick-start-guide.md ########## @@ -348,10 +341,10 @@ $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path> | Parameter | Description | Example | | ------------------- | ------------------------------------------------------------ | ---------------------------------------------------------- | -| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | carbondata_2.xx-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar | -| carbon_store_path | This is a parameter to the CarbonThriftServer class. This a HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. If not specified then it takes spark.sql.warehouse.dir path. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store` | +| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | apache-carbondata-xx.jar | +| carbon_store_path | This is a parameter to the CarbonThriftServer class. This HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. If not specified then it takes spark.sql.warehouse.dir path. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store` | Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
maheshrajus commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420665066 ########## File path: docs/quick-start-guide.md ########## @@ -348,10 +341,10 @@ $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path> | Parameter | Description | Example | | ------------------- | ------------------------------------------------------------ | ---------------------------------------------------------- | -| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | carbondata_2.xx-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar | -| carbon_store_path | This is a parameter to the CarbonThriftServer class. This a HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. If not specified then it takes spark.sql.warehouse.dir path. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store` | +| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | apache-carbondata-xx.jar | +| carbon_store_path | This is a parameter to the CarbonThriftServer class. This HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. If not specified then it takes spark.sql.warehouse.dir path. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store` | Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
maheshrajus commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420665148 ########## File path: docs/quick-start-guide.md ########## @@ -369,7 +362,7 @@ $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path> ``` ./bin/spark-submit \ --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer \ -$SPARK_HOME/carbonlib/carbondata_2.xx-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar \ +$SPARK_HOME/carbonlib/apache-carbondata-xxx.jar \ hdfs://<host_name>:port/user/hive/warehouse/carbon.store Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420692648 ########## File path: docs/quick-start-guide.md ########## @@ -164,6 +166,7 @@ Start Spark shell by running the following command in the Spark directory: We also can create a new SparkSession instead of the built-in SparkSession `spark` if need. It need to add "org.apache.spark.sql.CarbonExtensions" into spark configuration "spark.sql.extensions". ``` + val newSpark Review comment: Please remove/correct this line ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420717676 ########## File path: docs/configuration-parameters.md ########## @@ -34,7 +34,7 @@ This section provides the details of all the configurations required for the Car | carbon.storelocation | spark.sql.warehouse.dir property value | Location where CarbonData will create the store, and write the data in its custom format. If not specified,the path defaults to spark.sql.warehouse.dir property. **NOTE:** Store location should be in HDFS or S3. | Review comment: remove it. They should not configure in 2.0 version ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420719130 ########## File path: docs/quick-start-guide.md ########## @@ -343,21 +337,23 @@ b. Run the following command to start the CarbonData thrift server. ``` ./bin/spark-submit \ --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer \ -$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path> +$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <access_key> <secret_key> <endpoint> Review comment: AK SK endpoint is optional, not mandatory. so remove it from here and other places. Add a note that, to work with s3. you can follow below command. <Add command with ak sk endpoint > ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
maheshrajus commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420734232 ########## File path: docs/quick-start-guide.md ########## @@ -164,6 +166,7 @@ Start Spark shell by running the following command in the Spark directory: We also can create a new SparkSession instead of the built-in SparkSession `spark` if need. It need to add "org.apache.spark.sql.CarbonExtensions" into spark configuration "spark.sql.extensions". ``` + val newSpark Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
maheshrajus commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420734793 ########## File path: docs/configuration-parameters.md ########## @@ -34,7 +34,7 @@ This section provides the details of all the configurations required for the Car | carbon.storelocation | spark.sql.warehouse.dir property value | Location where CarbonData will create the store, and write the data in its custom format. If not specified,the path defaults to spark.sql.warehouse.dir property. **NOTE:** Store location should be in HDFS or S3. | Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
maheshrajus commented on a change in pull request #3740: URL: https://github.com/apache/carbondata/pull/3740#discussion_r420739855 ########## File path: docs/quick-start-guide.md ########## @@ -343,21 +337,23 @@ b. Run the following command to start the CarbonData thrift server. ``` ./bin/spark-submit \ --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer \ -$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path> +$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <access_key> <secret_key> <endpoint> Review comment: ok ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |