[jira] [Created] (CARBONDATA-458) Improving carbon first time query performance

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (CARBONDATA-458) Improving carbon first time query performance

Akash R Nilugal (Jira)
kumar vishal created CARBONDATA-458:
---------------------------------------

             Summary:  Improving carbon first time query performance
                 Key: CARBONDATA-458
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-458
             Project: CarbonData
          Issue Type: Improvement
          Components: core, data-load, data-query
            Reporter: kumar vishal
            Assignee: kumar vishal


Improving carbon first time query performance

Reason:
1. As file system cache is cleared file reading will make it slower to read and cache
2. In first time query carbon will have to read the footer from file data file to form the btree
3. Carbon reading more footer data than its required(data chunk)
4. There are lots of random seek is happening in carbon as column data(data page, rle, inverted index) are not stored together.

Solution:
1. Improve block loading time. This can be done by removing data chunk from blockletInfo and storing only offset and length of data chunk
2. compress presence meta bitset stored for null values for measure column using snappy
3. Store the metadata and data of a column together and read together this reduces random seek and improve IO




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)