Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Resolved] (CARBONDATA-1690) Query failed after swap table by renaming

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Resolved] (CARBONDATA-1690) Query failed after swap table by renaming

[ https://issues.apache.org/jira/browse/CARBONDATA-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ravindra Pesala resolved CARBONDATA-1690.
-----------------------------------------
Resolution: Fixed

> Query failed after swap table by renaming
> -----------------------------------------
>
> Key: CARBONDATA-1690
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1690
> Project: CarbonData
> Issue Type: Bug
> Components: spark-integration
> Affects Versions: 1.3.0
> Reporter: xuchuanyin
> Assignee: xuchuanyin
> Fix For: 1.3.0
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> # SCENARIO
> I encountered query error after swap table by renaming table. Steps to reproduce this bug are listed as below.
> These steps work fine:
> 1. CREATE TABLE `t1`;
> 2. LOAD DATA TO `t1`;
> 3. CREATE TABLE `t2`;
> 4. LOAD DATA TO `t2`;
> 5. RENAME `t1` TO `t3`;
> 6. RENAME `t2` TO `t1`;
> 7. QUERY `t1`;
> These steps work wrong:
> 1. CREATE TABLE `t1`;
> 2. LOAD DATA TO `t1`;
> 3. CREATE TABLE `t2`;
> 4. LOAD DATA TO `t2`;
> **5. QUERY `t1`;** --- Added this step
> 6. RENAME `t1` TO `t3`;
> 7. RENAME `t2` TO `t1`;
> 8. QUERY `t1`; --- This step will cause failure
> The above two scenario differs from that the second one add Step5 and the error will be thrown in Step8. The error message in sparksql shell looks like
> ```
> Error: java.io.FileNotFoundException: File hdfs://slave1:9000/carbonstore/default/test_table/Fact/Part0/Segment_0/part-0-0_batchno0-0-1510144676427.carbondata does not exist. (state=,code=0)
> ```
> # Analyze
> Renaming table name in carbondata actually is done through renaming the corresponding data folder name. In addition, carbondata also refresh the metadata and its cache.
> Having seen from the error message above, we find that the file name is exactly the one before rename operation. We guess the problems may lies in data map.
> In the second scenario, before renaming, when we query `t1 ` (Step5), the corresponding data map will be loaded and cached. Since data map is table name based, when we query `t1` again (Step8) after renaming, the previous data map will be used, which is outdated and incorrect, thus will cause the `FileNotFoundException` error.
> In the first scenario, when we query `t1` (Step7), it is the first time to load the data map, so the correct data will be readed, that's why it acts OK.
> # Resolve
> There are two ways to fix this bug:
> 1. Change the index key of Data Map. Use `table_name + table_schema_last_update_time` in replace of `table_name`.
> 2. Clear corresponding Data Map when doing renaming operation.
> I prefer the second one since it is easy to implement —— just one line of code.

--
This message was sent by Atlassian JIRA
(v6.4.14#64029)