[discuss]CarbonData update operation enhance

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[discuss]CarbonData update operation enhance

Linwood
*[Background]*
Update operation will clean up delta files before update( see
cleanUpDeltaFiles(carbonTable, false)), It's loop traversal metadata path
and segment path many times. When there are too many files, the overhead
will increase and update time will be longer.

*[Motivation & Goal]*
During the update process, reduce loop traversal or remove cleanUpDelteFiles
to another method.

*[Modification]*
There are some solutions as following.

Solution 1:

In cleanUpDeltaFiles have some same points in get files method, like
updateStatusManager.getUpdateDeltaFilesList(segment,
false,CarbonCommonConstants.UPDATE_DELTA_FILE_EXT, true,
allSegmentFiles,true) and
updateStatusManager.getUpdateDeltaFilesList(segment,
false,CarbonCommonConstants.UPDATE_INDEX_FILE_EXT, true,
allSegmentFiles,true), They are just different file types,but loop traversal
segment path twice. we can merge it.

Solution 2:

Base solution 1,Use Spark or MapReduce to hand over tasks to other nodes.

Solution 3:

Submit cleanUpDelaFiles  to another task, process them in the early morning
or when the cluster is not busy.

Solution 4:

Establish a garbage collection bin, which provides some interfaces for our
program to determine when files enter the garbage collection bin and how to
deal with them.

Please vote for all solutions.

Best Regards,
LinWood



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [discuss]CarbonData update operation enhance

Liang Chen
Administrator
Hi

Thank you started this discussion.
This proposal is for improving data updation performance, right ?

Regards
Liang


Linwood wrote

> *[Background]*
> Update operation will clean up delta files before update( see
> cleanUpDeltaFiles(carbonTable, false)), It's loop traversal metadata path
> and segment path many times. When there are too many files, the overhead
> will increase and update time will be longer.
>
> *[Motivation & Goal]*
> During the update process, reduce loop traversal or remove
> cleanUpDelteFiles
> to another method.
>
> *[Modification]*
> There are some solutions as following.
>
> Solution 1:
>
> In cleanUpDeltaFiles have some same points in get files method, like
> updateStatusManager.getUpdateDeltaFilesList(segment,
> false,CarbonCommonConstants.UPDATE_DELTA_FILE_EXT, true,
> allSegmentFiles,true) and
> updateStatusManager.getUpdateDeltaFilesList(segment,
> false,CarbonCommonConstants.UPDATE_INDEX_FILE_EXT, true,
> allSegmentFiles,true), They are just different file types,but loop
> traversal
> segment path twice. we can merge it.
>
> Solution 2:
>
> Base solution 1,Use Spark or MapReduce to hand over tasks to other nodes.
>
> Solution 3:
>
> Submit cleanUpDelaFiles  to another task, process them in the early
> morning
> or when the cluster is not busy.
>
> Solution 4:
>
> Establish a garbage collection bin, which provides some interfaces for our
> program to determine when files enter the garbage collection bin and how
> to
> deal with them.
>
> Please vote for all solutions.
>
> Best Regards,
> LinWood
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [discuss]CarbonData update operation enhance

David CaiQiang
In reply to this post by Linwood
hi Linwood,
  1. better to implement "Update feature enhancement" at first, it will
create a new segment to store new files.
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Update-feature-enhancement-td99769.html
  2. clean deletedelta files
      now carbon need clean invalid .deletedelta files before update/delete.
If we don't clean them, after next update/delete, these files will become
valid .deletedela files.

      How to avoid clean invalid .deletedelta files and they don't impact
data after next update/delete operation?



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai