[ https://issues.apache.org/jira/browse/CARBONDATA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal updated CARBONDATA-3006: ------------------------------------- Summary: Carbon Store Size Optimization and Query Performance Improvement (was: Carbon Store Size Optimization and Scan Query Performance Improvement) > Carbon Store Size Optimization and Query Performance Improvement > ---------------------------------------------------------------- > > Key: CARBONDATA-3006 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3006 > Project: CarbonData > Issue Type: Improvement > Reporter: kumar vishal > Priority: Major > > *String/Varchar Datatype Store Size Optimization:* > Currently length is stored as Short/Int for String/varchar datatype because of this store size is more. To reduce the store size Adaptive encoding is applied for length part irrespective of String/Varchar type so during processing there will not be separate handling for String/Varchar datatype. > *String/Varchar datatype query processing optimization:* > Currently for processing the String/Varchar datatype during query offset(positions of data) is calculated and based on position data is fetched. Because of this many cacheline misses is happening and its degrading query performance. > To handle this for full scan query with no inverted index, data is fetched is in linear way to avoid cache line misses. > *Adaptive encoding for Global/Direct/Local dictionary columns* > Currently Global/Direct/Local dictionary are stored in binary format and only snappy is applied for compression. As Global/Direct/Local dictionary values are of Integer data type it can adaptability stored with the data type smaller than Integer. > Added adaptive for global/direct dictionary column to reduce the store size. > *Method In-lining Optimization* > JIT will inline any method if method size is less than 325 byte code size and if it is called more than 10K times(default value). If method is private or static it will be easier for JIT to inline as type safe check is not required, for protected/public method it will add a overhead of type check and because of this it will not behave as inline. > Because of above case some refactoring is done for primitive no dictionary data type columns. Earlier ColumnPageWrapper.java was handling query processing for all primitive no dictionary data type column now in This PR separate classes are created for each data type handling and all the HOT method is kept as private and protected methods are overridden and other methods are added in Super classes -- This message was sent by Atlassian JIRA (v7.6.3#76005) |
Free forum by Nabble | Edit this page |