[jira] [Resolved] (CARBONDATA-1366) When sort_scope=global_sort, use 'StorageLevel.MEMORY_AND_DISK_SER' instead of 'StorageLevel.MEMORY_AND_DISK' for 'convertRDD' persisting to improve loading performance

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Resolved] (CARBONDATA-1366) When sort_scope=global_sort, use 'StorageLevel.MEMORY_AND_DISK_SER' instead of 'StorageLevel.MEMORY_AND_DISK' for 'convertRDD' persisting to improve loading performance

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhichao  Zhang resolved CARBONDATA-1366.
----------------------------------------
    Resolution: Fixed

> When sort_scope=global_sort, use 'StorageLevel.MEMORY_AND_DISK_SER' instead of 'StorageLevel.MEMORY_AND_DISK' for 'convertRDD' persisting  to improve loading performance
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-1366
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1366
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-load, spark-integration
>    Affects Versions: 1.2.0
>            Reporter: Zhichao  Zhang
>            Assignee: Zhichao  Zhang
>            Priority: Minor
>             Fix For: 1.2.0
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> My testing env and configs are as followings:
> Env:
> 6 executors, 9G mem + 6 cores per executor
> Configs:
> SINGLE_PASS=true
> SORT_SCOPE=GLOBAL_SORT
> spark.memory.fraction=0.5
> if using 'convertRDD.persist(StorageLevel.MEMORY_AND_DISK_SER)' in method 'org.apache.carbondata.spark.load.DataLoadProcessBuilderOnSpark.loadDataUsingGlobalSort', it takes about 7.2 min to load 144136697 lines (10.9 G parquet files), and if using 'convertRDD.persist(StorageLevel.MEMORY_AND_DISK)', it takes about 9.5 min to load 144136697 lines.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)