[jira] [Updated] (CARBONDATA-429) Remove unnecessary file name check in dictionary cache

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-429) Remove unnecessary file name check in dictionary cache

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

suo tong updated CARBONDATA-429:
--------------------------------
    Description:
1. In dictionary cache, there are currently many unnecessary file name check for each column, which increase the number of calling  HDFS getFileStatus.
2. And in checkAndLoadDictionaryData, we get meta file's mtime from hdfs each time we call cache.get to check if the local is valid or not.  The local dictionary cache may be invalid after another job finished load data.  This will still increases calling getFileStatus

  was:
1. In dictionary cache, there are currently many unnecessary file name check for each column, which increase the number of HDFS interactions.
2. And in checkAndLoadDictionaryData, we get meta file's mtime from hdfs each time we call cache.get to check if the local is valid or not.  The local dictionary cache may be invalid after another job finished load data.


> Remove unnecessary file name check in dictionary cache
> ------------------------------------------------------
>
>                 Key: CARBONDATA-429
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-429
>             Project: CarbonData
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.1.1-incubating
>            Reporter: Jacky Li
>            Assignee: Ashok Kumar
>             Fix For: 1.0.0-incubating
>
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> 1. In dictionary cache, there are currently many unnecessary file name check for each column, which increase the number of calling  HDFS getFileStatus.
> 2. And in checkAndLoadDictionaryData, we get meta file's mtime from hdfs each time we call cache.get to check if the local is valid or not.  The local dictionary cache may be invalid after another job finished load data.  This will still increases calling getFileStatus



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)