[jira] [Commented] (CARBONDATA-4044) Fix dirty data in indexfile while IUD with stale data in segment folder

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (CARBONDATA-4044) Fix dirty data in indexfile while IUD with stale data in segment folder

Akash R Nilugal (Jira)

    [ https://issues.apache.org/jira/browse/CARBONDATA-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221222#comment-17221222 ]

Yahui Liu commented on CARBONDATA-4044:
---------------------------------------

I have faced this issue, and try to solve it. Currently we will call CarbonLoaderUtil.checkAndCreateCarbonDataLocation to check and create Segment_XXX folder(if not exist), but we didn't check wherther stale data exist in segment folder when Segment_XXX folder already exists. My idea is to try to remove Segment_XXX folder always before creating Segment_XXX folder again. It will make sure there will be no stale data. Is this solution validation for all cases? Please provide some ideas to me. Thanks. 

> Fix dirty data in indexfile while IUD with stale data in segment folder
> -----------------------------------------------------------------------
>
>                 Key: CARBONDATA-4044
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-4044
>             Project: CarbonData
>          Issue Type: Bug
>            Reporter: Xingjun Hao
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> XX.mergecarbonindex and XX..segment records the indexfiles list of a segment. now, we generate xx.mergeindexfile and xx.segment  based on filter out all indexfiles(including carbonindex and mergecarbonindex), which will leading dirty data when there is stale data in segment folder.
> For example, there are a stale index file in segment_0 folder, "0_1603763776.carbonindex".
> While loading, a new carbonindex "0_16037752342.carbonindex" is wrote, when merge carbonindex files, we expect to only merge 0_16037752342.carbonindex, But If we filter out all carbonindex in segment folder, both "0_1603763776.carbonindex" and 0_16037752342.carbonindex will be merged and recorded into segment file.
>  
> While updating, there has same problem. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)