GitHub user ravipesala opened a pull request:
https://github.com/apache/carbondata/pull/1189 [WIP] Insert overwrite support and force clean up files and clean up in progress files support added The following features are added in this PR. 1. Added support for `LOAD OVERWRITE` and `INSERT OVERWRITE` in carbon load. So after user issues overwrite command all old data will be overwritten with new data. Example : ``` LOAD DATA INPATH '" data.csv' overwrite INTO table carbontable ``` ``` insert overwrite table carbontable select * from othertable ``` When overwrite is in progress no other load will be allowed . And if any other load is already in progress also will be overwritten 2. Added support for force clean table to remove the table with force from disk. It is useful in case of inconsistency with hive metastore. This support is only internal purpose and not exposed to user, so it is supported through scala API not through SQL. 3. Cleanup the inprogress files while driver is initializing. In case of driver is down while any load is in progress then it must be cleaned while coming up of driver. This is only controlled through parameter `spark.carbon.table.loader.driver` , so it must be set true in driver properties to cleanup the inprogress files. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata insert-overwrite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1189.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1189 ---- commit 1eca780ee69b07cdf2a86df1759dfaa7d0f96fd8 Author: Ravindra Pesala <[hidden email]> Date: 2017-07-20T09:27:21Z Insert overwrite support and force clean up files and clean up in progress files support added ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1189#discussion_r128481387 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1264,6 +1264,14 @@ public static final String ENABLE_HIVE_SCHEMA_META_STORE_DEFAULT = "false"; + /** + * There is more often that in production uses different drivers for load and queries. So in case + * of load driver user should set this property to enable loader specific clean up. + */ + public static final String TABLE_LOADER_DRIVER = "spark.carbon.table.loader.driver"; --- End diff -- I think this property not just for loading, any transactional operation should use this driver. So can you rename it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1189 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/558/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1189 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3151/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1189#discussion_r128486494 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -485,8 +487,8 @@ case class LoadTable( } val dbName = databaseNameOp.getOrElse(sparkSession.catalog.currentDatabase) - if (isOverwriteExist) { - sys.error(s"Overwrite is not supported for carbon table with $dbName.$tableName") + if (isOverwriteTable) { + LOGGER.info(s"Overwrite of carbon table with $dbName.$tableName is in progress") --- End diff -- should first check whether there is overwrite on going, then do this log --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1189#discussion_r128487079 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -617,4 +621,75 @@ object CommonUtil { AttributeReference("partition", StringType, nullable = false, new MetadataBuilder().putString("comment", "partitions info").build())() ) + + def cleanInProgressSegments(storePath: String, sparkContext: SparkContext): Unit = { --- End diff -- please add some description --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1189#discussion_r128487481 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -485,8 +487,8 @@ case class LoadTable( } val dbName = databaseNameOp.getOrElse(sparkSession.catalog.currentDatabase) - if (isOverwriteExist) { - sys.error(s"Overwrite is not supported for carbon table with $dbName.$tableName") + if (isOverwriteTable) { + LOGGER.info(s"Overwrite of carbon table with $dbName.$tableName is in progress") --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1189#discussion_r128487500 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -617,4 +621,75 @@ object CommonUtil { AttributeReference("partition", StringType, nullable = false, new MetadataBuilder().putString("comment", "partitions info").build())() ) + + def cleanInProgressSegments(storePath: String, sparkContext: SparkContext): Unit = { --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1189#discussion_r128490892 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -617,4 +621,75 @@ object CommonUtil { AttributeReference("partition", StringType, nullable = false, new MetadataBuilder().putString("comment", "partitions info").build())() ) + + def cleanInProgressSegments(storePath: String, sparkContext: SparkContext): Unit = { --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1189#discussion_r128495694 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1264,6 +1264,14 @@ public static final String ENABLE_HIVE_SCHEMA_META_STORE_DEFAULT = "false"; + /** + * There is more often that in production uses different drivers for load and queries. So in case + * of load driver user should set this property to enable loader specific clean up. + */ + public static final String TABLE_LOADER_DRIVER = "spark.carbon.table.loader.driver"; --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1189 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3153/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1189#discussion_r128496772 --- Diff: integration/spark2/src/main/scala/org/apache/spark/util/CleanFiles.scala --- @@ -29,12 +29,12 @@ import org.apache.carbondata.api.CarbonStore object CleanFiles { def cleanFiles(spark: SparkSession, dbName: String, tableName: String, - storePath: String): Unit = { + storePath: String, forceTableClean: Boolean): Unit = { --- End diff -- add default value to `forceTableClean` and add comment for this function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1189 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/560/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1189 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3154/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1189 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/561/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:
https://github.com/apache/carbondata/pull/1189 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |