[ https://issues.apache.org/jira/browse/CARBONDATA-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-318: -------------------------------- Description: External Sorter should sort in memory until it reach configured size, then spill to disk. It should provide following interface: 1. insertRow/insertRowBatch: will take an Iterator as input and insert rows from the iterator into the sorter. Some consideration 1) sorter will decide when to spill to disk based on the total inserted size. (JDK does not provide API for object size, need another JIRA issue to improve on this) 2) use TreeMap as sorter's in memory data structure, since it can sort as data inserted online 2. getIterator: will return an iterator that iterates on sorted rows, the sorted row could come from memory or files External Sorter depends on FileWriterFactory to get a FileWriter to spill data into files. FileWriterFactory should be provided by configuration. Multiple implementations are possible, like writing into one folder or multiple folders was: External Sorter should sort in memory until it reach configured size, then spill to disk. It should provide following interface: 1. insertRow/insertRowBatch: will take an Iterator as input and insert rows from the iterator into the sorter. sorter will decide when to spill to disk based on the total inserted size. (JDK does not provide API for object size, need another JIRA issue to improve on this) 2. getIterator: will return an iterator that iterates on sorted rows, the sorted row could come from memory or files External Sorter depends on FileWriterFactory to get a FileWriter to spill data into files. FileWriterFactory should be provided by configuration. Multiple implementations are possible, like writing into one folder or multiple folders > Implement an ExternalSorter that makes maximum usage of memory while sorting > ---------------------------------------------------------------------------- > > Key: CARBONDATA-318 > URL: https://issues.apache.org/jira/browse/CARBONDATA-318 > Project: CarbonData > Issue Type: Sub-task > Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > External Sorter should sort in memory until it reach configured size, then spill to disk. It should provide following interface: > 1. insertRow/insertRowBatch: will take an Iterator as input and insert rows from the iterator into the sorter. Some consideration > 1) sorter will decide when to spill to disk based on the total inserted size. (JDK does not provide API for object size, need another JIRA issue to improve on this) > 2) use TreeMap as sorter's in memory data structure, since it can sort as data inserted online > 2. getIterator: will return an iterator that iterates on sorted rows, the sorted row could come from memory or files > External Sorter depends on FileWriterFactory to get a FileWriter to spill data into files. FileWriterFactory should be provided by configuration. Multiple implementations are possible, like writing into one folder or multiple folders -- This message was sent by Atlassian JIRA (v6.3.4#6332) |
Free forum by Nabble | Edit this page |