[jira] [Updated] (CARBONDATA-318) Implement an ExternalSorter that makes maximum usage of memory while sorting

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-318) Implement an ExternalSorter that makes maximum usage of memory while sorting

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacky Li updated CARBONDATA-318:
--------------------------------
    Description:
External Sorter should sort in memory until it reach configured size, then spill to disk. It should provide following interface:
1. insertRow/insertRowBatch: will take an Iterator as input and insert rows from the iterator into the sorter. sorter will decide when to spill to disk based on the total inserted size. (JDK does not provide API for object size, need another JIRA issue to improve on this)
2. getIterator: will return an iterator that iterates on sorted rows, the sorted row could come from memory or files

External Sorter depends on FileWriterFactory to get a FileWriter to spill data into files. FileWriterFactory should be provided by configuration. Multiple implementations are possible, like writing into one folder or multiple folders

  was:
External Sorter should sort in memory until it reach configured size, then spill to disk. It should provide following interface:
1. insertRow/insertRowBatch: will take an Iterator as input and insert rows from the iterator into the sorter
2. getIterator: will return an iterator that iterates on sorted rows, the sorted row could come from memory or files

External Sorter depends on FileWriterFactory to get a FileWriter to spill data into files. FileWriterFactory should be provided by configuration. Multiple implementations are possible, like writing into one folder or multiple folders


> Implement an ExternalSorter that makes maximum usage of memory while sorting
> ----------------------------------------------------------------------------
>
>                 Key: CARBONDATA-318
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-318
>             Project: CarbonData
>          Issue Type: Sub-task
>            Reporter: Jacky Li
>             Fix For: 0.2.0-incubating
>
>
> External Sorter should sort in memory until it reach configured size, then spill to disk. It should provide following interface:
> 1. insertRow/insertRowBatch: will take an Iterator as input and insert rows from the iterator into the sorter. sorter will decide when to spill to disk based on the total inserted size. (JDK does not provide API for object size, need another JIRA issue to improve on this)
> 2. getIterator: will return an iterator that iterates on sorted rows, the sorted row could come from memory or files
> External Sorter depends on FileWriterFactory to get a FileWriter to spill data into files. FileWriterFactory should be provided by configuration. Multiple implementations are possible, like writing into one folder or multiple folders



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)