[jira] [Commented] (CARBONDATA-318) Implement an InMemory Sorter that makes maximum usage of memory while sorting

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (CARBONDATA-318) Implement an InMemory Sorter that makes maximum usage of memory while sorting

Akash R Nilugal (Jira)

    [ https://issues.apache.org/jira/browse/CARBONDATA-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15579084#comment-15579084 ]

ASF GitHub Bot commented on CARBONDATA-318:
-------------------------------------------

GitHub user jackylk opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/242

    [CARBONDATA-318] Implement an InMemory Sorter that makes maximum usage of memory for data load

    Changed as following:
    1. Change SortDataRows.java to keep rows and sort in memory if memory is sufficient, otherwise spill to disk.
    2. Change SortKeyStep and MdkeyGenStep to support both in memory sort and merge sort.
   
    To choose between these two approaches, user can set SORT_SIZE in carbon property, like set it to 3 million rows:
   
    ```
      // Number of rows to keep in memory when loading data, if number of input row exceeds this value,
      // carbon will use merge sort instead of in memory sort
    CarobonPropery.getInstance().addProperty(CarbonCommonConstants.SORT_SIZE, "3000000")
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jackylk/incubator-carbondata in-memory-sort

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/242.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #242
   
----
commit 23d9fbbca33cd5bb56a0a648596ad4d61c32fa04
Author: jackylk <[hidden email]>
Date:   2016-10-15T15:53:16Z

    add in memory sort in data load

commit 45dae7c4c7b535540bafb15bcc896061cfae7ca7
Author: jackylk <[hidden email]>
Date:   2016-10-15T17:55:09Z

    fix empty row

----


> Implement an InMemory Sorter that makes maximum usage of memory while sorting
> -----------------------------------------------------------------------------
>
>                 Key: CARBONDATA-318
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-318
>             Project: CarbonData
>          Issue Type: Sub-task
>            Reporter: Jacky Li
>            Assignee: Jacky Li
>             Fix For: 0.2.0-incubating
>
>
> Change SortDataRows into an External Sorter, it should sort in memory until it reach configured size, then spill to disk. It should provide following interface:
> 1. addRow:  insert rows into the sorter.
> 2. getIterator: will return an iterator that iterates on sorted rows, the sorted row could come from memory or files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)