Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1189: [WIP] Insert overwrite support and force clea...

Classic

List

Threaded

16 messages Options

qiuchenjian-2

[GitHub] carbondata pull request #1189: [WIP] Insert overwrite support and force clea...

GitHub user ravipesala opened a pull request:

https://github.com/apache/carbondata/pull/1189

[WIP] Insert overwrite support and force clean up files and clean up in progress files support added

The following features are added in this PR.
1. Added support for `LOAD OVERWRITE` and `INSERT OVERWRITE` in carbon load. So after user issues overwrite command all old data will be overwritten with new data.
Example :
```
LOAD DATA INPATH '" data.csv' overwrite INTO table carbontable
```
```
insert overwrite table carbontable select * from othertable
```
When overwrite is in progress no other load will be allowed . And if any other load is already in progress also will be overwritten

2. Added support for force clean table to remove the table with force from disk. It is useful in case of inconsistency with hive metastore. This support is only internal purpose and not exposed to user, so it is supported through scala API not through SQL.

3. Cleanup the inprogress files while driver is initializing. In case of driver is down while any load is in progress then it must be cleaned while coming up of driver. This is only controlled through parameter `spark.carbon.table.loader.driver` , so it must be set true in driver properties to cleanup the inprogress files.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata insert-overwrite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1189.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1189

----
commit 1eca780ee69b07cdf2a86df1759dfaa7d0f96fd8
Author: Ravindra Pesala <[hidden email]>
Date: 2017-07-20T09:27:21Z

Insert overwrite support and force clean up files and clean up in progress files support added

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

qiuchenjian-2

[GitHub] carbondata pull request #1189: [CARBONDATA-1322] Insert overwrite support an...

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1189#discussion_r128481387

--- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
@@ -1264,6 +1264,14 @@

public static final String ENABLE_HIVE_SCHEMA_META_STORE_DEFAULT = "false";

+ /**
+ * There is more often that in production uses different drivers for load and queries. So in case
+ * of load driver user should set this property to enable loader specific clean up.
+ */
+ public static final String TABLE_LOADER_DRIVER = "spark.carbon.table.loader.driver";
--- End diff --

I think this property not just for loading, any transactional operation should use this driver. So can you rename it?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

qiuchenjian-2