[GitHub] carbondata pull request #1479: [CARBONDATA-1690][DATAMAP] Clear datamap when...

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1479: [CARBONDATA-1690][DATAMAP] Clear datamap when...

qiuchenjian-2
GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/1479

    [CARBONDATA-1690][DATAMAP] Clear datamap when renaming table

    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [x] Any interfaces changed?
     `NO`
     - [x] Any backward compatibility impacted?
    `NO`
     - [x] Document update required?
    `NO`
     - [x] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            `ADDED NEW TEST`
            - How it is tested? Please attach test report.
            `TEST ON A LOCAL MACHINE`
            - Is it a performance related change? Please attach the performance test report.
            `NO`
            - Any additional information to help reviewers in testing this change.
            `NO`
   
     - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    `NOT RELATED`
   
    COPY FROM JIRA ISSUE
    ======
   
    # SCENARIO
   
    I encountered query error after swap table by renaming table. Steps to reproduce this bug are listed as below.
   
    These steps work fine:
   
    1. CREATE TABLE `t1`;
    2. LOAD DATA TO `t1`;
    3. CREATE TABLE `t2`;
    4. LOAD DATA TO `t2`;
    5. RENAME `t1` TO `t3`;
    6. RENAME `t2` TO `t1`;
    7. QUERY `t1`;
   
    These steps work wrong:
   
    1. CREATE TABLE `t1`;
    2. LOAD DATA TO `t1`;
    3. CREATE TABLE `t2`;
    4. LOAD DATA TO `t2`;
    **5. QUERY `t1`;**   --- Added this step
    6. RENAME `t1` TO `t3`;
    7. RENAME `t2` TO `t1`;
    8. QUERY `t1`;   --- This step will cause failure
   
    The above two scenario differs from that the second one add Step5 and the error will be thrown in Step8. The error message in sparksql shell looks like
    ```
    Error: java.io.FileNotFoundException: File hdfs://slave1:9000/carbonstore/default/test_table/Fact/Part0/Segment_0/part-0-0_batchno0-0-1510144676427.carbondata does not exist. (state=,code=0)
    ```
   
    # Analyze
   
    Renaming table name in carbondata actually is done through renaming the corresponding data folder name. In addition, carbondata also refresh the metadata and its cache.
   
    Having seen from the error message above, we find that the file name is exactly the one before rename operation. We guess the problems may lies in data map.
   
    In the second scenario, before renaming, when we query `t1 ` (Step5), the corresponding data map will be loaded and cached. Since data map is table name based, when we query `t1` again (Step8) after renaming, the previous data map will be used, which is outdated and incorrect, thus will cause the `FileNotFoundException` error.
   
    In the first scenario, when we query `t1` (Step7), it is the first time to load the data map, so the correct data will be readed, that's why it acts OK.
   
   
   
    # Resolve
   
    There are two ways to fix this bug:
   
    1. Change the index key of Data Map. Use `table_name + table_schema_last_update_time` in replace of `table_name`.
   
    2. Clear corresponding Data Map when doing renaming operation.
   
    I prefer the second one since it is easy to implement —— just one line of code.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata rename_table_datamap_clear

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1479.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1479
   
----
commit bdfbdbd6c0d419c57f25048ec427440dbc96fc99
Author: xuchuanyin <[hidden email]>
Date:   2017-11-09T13:25:06Z

    Clear datamap when renaming table

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1479: [CARBONDATA-1690][DATAMAP] Clear datamap when renami...

qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1479
 
    @xuchuanyin  your description is very clear, and thanks for your good contribution.
   
    @ravipesala  please check this PR.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1479: [CARBONDATA-1690][DATAMAP] Clear datamap when renami...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1479
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/934/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1479: [CARBONDATA-1690][DATAMAP] Clear datamap when renami...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1479
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1479: [CARBONDATA-1690][DATAMAP] Clear datamap when renami...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1479
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/941/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1479: [CARBONDATA-1690][DATAMAP] Clear datamap when renami...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1479
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1549/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1479: [CARBONDATA-1690][DATAMAP] Clear datamap when renami...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1479
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1556/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1479: [CARBONDATA-1690][DATAMAP] Clear datamap when renami...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1479
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1479: [CARBONDATA-1690][DATAMAP] Clear datamap when...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1479


---