[
https://issues.apache.org/jira/browse/CARBONDATA-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221222#comment-17221222 ]
Yahui Liu commented on CARBONDATA-4044:
---------------------------------------
I have faced this issue, and try to solve it. Currently we will call CarbonLoaderUtil.checkAndCreateCarbonDataLocation to check and create Segment_XXX folder(if not exist), but we didn't check wherther stale data exist in segment folder when Segment_XXX folder already exists. My idea is to try to remove Segment_XXX folder always before creating Segment_XXX folder again. It will make sure there will be no stale data. Is this solution validation for all cases? Please provide some ideas to me. Thanks.
> Fix dirty data in indexfile while IUD with stale data in segment folder
> -----------------------------------------------------------------------
>
> Key: CARBONDATA-4044
> URL:
https://issues.apache.org/jira/browse/CARBONDATA-4044> Project: CarbonData
> Issue Type: Bug
> Reporter: Xingjun Hao
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> XX.mergecarbonindex and XX..segment records the indexfiles list of a segment. now, we generate xx.mergeindexfile and xx.segment based on filter out all indexfiles(including carbonindex and mergecarbonindex), which will leading dirty data when there is stale data in segment folder.
> For example, there are a stale index file in segment_0 folder, "0_1603763776.carbonindex".
> While loading, a new carbonindex "0_16037752342.carbonindex" is wrote, when merge carbonindex files, we expect to only merge 0_16037752342.carbonindex, But If we filter out all carbonindex in segment folder, both "0_1603763776.carbonindex" and 0_16037752342.carbonindex will be merged and recorded into segment file.
>
> While updating, there has same problem.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)