[jira] [Updated] (CARBONDATA-1896) Clean files operation improvement

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-1896) Clean files operation improvement

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhatchayani updated CARBONDATA-1896:
------------------------------------
    Description:
+*Problem:*+
When bringing up the session, clean operation is handled in a way to mark all the INSERT_OVERWRITE_IN_PROGRESS or INSERT_IN_PROGRESS segments to MARKED_FOR_DELETE in tablestatus file. This clean operation is not considering the other parallel sessions. If any other session's data load is IN_PROGRESS at the time of bringing up one session, then the executing load also will be changed to MARKED_FOR_DELETE irrespective of the actual load status. Handling stale segments cleaning while session bring up also increases the time of bringing up a session.


+*Solution:*+
SEGMENT_LOCK should be taken on the new segment while loading.
While cleaning segments tablestatus file and SEGMENT_LOCK should be considered.
Cleaning stale files while bringing up the session should be removed and this can be either manually done on the needed tables through already existing CLEAN FILES DDL or the next  load will automatically clean the same.

*Impact analysis on the solution will be updated soon.*















  was:
+*Problem:*+
When bringing up the session, clean operation is handled in a way to mark all the INSERT_OVERWRITE_IN_PROGRESS or INSERT_IN_PROGRESS segments to MARKED_FOR_DELETE in tablestatus file. This clean operation is not considering the other parallel sessions. If any other session's data load is IN_PROGRESS at the time of bringing up one session, then the executing load also will be changed to MARKED_FOR_DELETE irrespective of the actual load status. Handling stale segments cleaning while session bring up also increases the time of bringing up a session.


+*Solution:*+
SEGMENT_LOCK should be taken on the new segment while loading.
While cleaning segments tablestatus file and SEGMENT_LOCK should be considered.
Cleaning stale files while bringing up the session should be removed and this should be manually done on the needed tables through already existing CLEAN FILES DDL.

*Impact analysis on the solution will be updated soon.*
















> Clean files operation improvement
> ---------------------------------
>
>                 Key: CARBONDATA-1896
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1896
>             Project: CarbonData
>          Issue Type: Bug
>            Reporter: dhatchayani
>            Assignee: dhatchayani
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> +*Problem:*+
> When bringing up the session, clean operation is handled in a way to mark all the INSERT_OVERWRITE_IN_PROGRESS or INSERT_IN_PROGRESS segments to MARKED_FOR_DELETE in tablestatus file. This clean operation is not considering the other parallel sessions. If any other session's data load is IN_PROGRESS at the time of bringing up one session, then the executing load also will be changed to MARKED_FOR_DELETE irrespective of the actual load status. Handling stale segments cleaning while session bring up also increases the time of bringing up a session.
> +*Solution:*+
> SEGMENT_LOCK should be taken on the new segment while loading.
> While cleaning segments tablestatus file and SEGMENT_LOCK should be considered.
> Cleaning stale files while bringing up the session should be removed and this can be either manually done on the needed tables through already existing CLEAN FILES DDL or the next  load will automatically clean the same.
> *Impact analysis on the solution will be updated soon.*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)