[ https://issues.apache.org/jira/browse/CARBONDATA-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venugopal Reddy K updated CARBONDATA-3519: ------------------------------------------ Description: {code:java} {code} *// code placeholder**Issue-1:* {color:#0747a6}*Context:*{color} For a string column with local dictionary enabled, a column page of `{color:#de350b}UnsafeFixLengthColumnPage{color}` with datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for `{color:#de350b}{{encodedPage}}{color}` along with regular `{color:#de350b}{{actualPage}}{color}` of `{color:#de350b}{{UnsafeVarLengthColumnPage}}{color}`. We have `{color:#de350b}*{{capacity}}*{color}` field in the `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}`. And this field indicates the capacity of allocated `{color:#de350b}{{memoryBlock}}{color}` for the page. `{{{color:#de350b}ensureMemory{color}()}}` method gets called while adding rows to check if `{color:#de350b}{{totalLength + requestSize > capacity}}{color}` to allocate a new memoryBlock. If there is no room to add the next row, allocates a new block, copy the old context(prev rows) and free the old memoryBlock. {color:#0747a6} *Problem:*{color} While, `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}` with with datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for `{color:#de350b}{{encodedPage}}{color}`, we have not assigned the *`{color:#de350b}{{capacity}}{color}`* field with allocated memory block size. Hence, for each add row to tablePage, *ensureMemory() check always fails*, allocates a new column page memoryBlock, copy the old context(prev rows) and free the old memoryBlock. This *allocation of new memoryBlock and free of old memoryBlock happens for each row row addition* for the string columns with local dictionary. +*Issue-2:*+ {color:#0747a6}*Context:*{color} In`{color:#de350b}VarLengthColumnPageBase{color}`, we have a `{color:#de350b}rowOffset{color}` column page of `{color:#de350b}UnsafeFixLengthColumnPage{color}` of datatype `{color:#de350b}INT{color}` to maintain the data offset to {color:#172b4d}each{color} row of variable length columns. This `{color:#de350b}rowOffset{color}` page allocates to be size of page. {color:#0747a6} *Problem:*{color} {color:#172b4d}If we have 10 rows in the page, we need 11 rows for its rowOffset page. Because we always keep 0 as offset to 1st row. So an additional row is required for rowOffset page[pasted code below to show the reference]. Otherwise, *ensureMemory() check always fails for the last row*(10th row in this case) of data and *allocates a new rowOffset page memoryBlock, copy the old context(prev rows) and free the old memoryBlock*. This *can happen for the string columns with local dictionary, direct dictionary columns, global disctionary columns*.{color} {code:java} public abstract class VarLengthColumnPageBase extends ColumnPage { ... @Override public void putBytes(int rowId, byte[] bytes) { ... if (rowId == 0) { rowOffset.putInt(0, 0); ==> offset to 1st row is 0. } rowOffset.putInt(rowId + 1, rowOffset.getInt(rowId) + bytes.length); putBytesAtRow(rowId, bytes); totalLength += bytes.length; } ... } {code} was: {code:java} {code} *// code placeholder**Issue-1:* {color:#0747a6}*Context:*{color} For a string column with local dictionary enabled, a column page of `{color:#de350b}UnsafeFixLengthColumnPage{color}` with datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for `{color:#de350b}{{encodedPage}}{color}` along with regular `{color:#de350b}{{actualPage}}{color}` of `{color:#de350b}{{UnsafeVarLengthColumnPage}}{color}`. We have `{color:#de350b}*{{capacity}}*{color}` field in the `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}`. And this field indicates the capacity of allocated `{color:#de350b}{{memoryBlock}}{color}` for the page. `{{{color:#de350b}ensureMemory{color}()}}` method gets called while adding rows to check if `{color:#de350b}{{totalLength + requestSize > capacity}}{color}` to allocate a new memoryBlock. If there is no room to add the next row, allocates a new block, copy the old context(prev rows) and free the old memoryBlock. {color:#0747a6} *Problem:*{color} While, `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}` with with datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for `{color:#de350b}{{encodedPage}}{color}`, we have not assigned the *`{color:#de350b}{{capacity}}{color}`* field with allocated memory block size. Hence, for each add row to tablePage, *ensureMemory() check always fails*, allocates a new column page memoryBlock, copy the old context(prev rows) and free the old memoryBlock. This *allocation of new memoryBlock and free of old memoryBlock happens for each row row addition* for the string columns with local dictionary. +*Issue-2:*+ {color:#0747a6}*Context:*{color} In`{color:#de350b}VarLengthColumnPageBase{color}`, we have a `{color:#de350b}rowOffset{color}` column page of `{color:#de350b}UnsafeFixLengthColumnPage{color}` of datatype `{color:#de350b}INT{color}` to maintain the data offset to {color:#172b4d}each{color} row of variable length columns. This `{color:#de350b}rowOffset{color}` page allocates to be size of page. {color:#0747a6} *Problem:*{color} {color:#172b4d}If we have 10 rows in the page, we need 11 rows for its rowOffset page. Because we always keep 0 as offset to 1st row. So an additional row is required for rowOffset page[pasted code below to show the reference]. Otherwise, *ensureMemory() check always fails for the last row*(10th row in this case) of data and *allocates a new rowOffset page memoryBlock, copy the old context(prev rows) and free the old memoryBlock*. This *can happen for the string columns with local dictionary, direct dictionary columns, global disctionary columns*.{color} {code:java} public abstract class VarLengthColumnPageBase extends ColumnPage { ... @Override public void putBytes(int rowId, byte[] bytes) { ... if (rowId == 0) { rowOffset.putInt(0, 0); } rowOffset.putInt(rowId + 1, rowOffset.getInt(rowId) + bytes.length); putBytesAtRow(rowId, bytes); totalLength += bytes.length; } ... } {code} > A new column page MemoryBlock is allocated at each row addition to table page if having string column with local dictionary enabled. > ------------------------------------------------------------------------------------------------------------------------------------- > > Key: CARBONDATA-3519 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3519 > Project: CarbonData > Issue Type: Improvement > Components: core > Reporter: Venugopal Reddy K > Priority: Minor > > > {code:java} > {code} > *// code placeholder**Issue-1:* > > {color:#0747a6}*Context:*{color} > For a string column with local dictionary enabled, a column page of > `{color:#de350b}UnsafeFixLengthColumnPage{color}` with datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for `{color:#de350b}{{encodedPage}}{color}` along with regular `{color:#de350b}{{actualPage}}{color}` of `{color:#de350b}{{UnsafeVarLengthColumnPage}}{color}`. > We have `{color:#de350b}*{{capacity}}*{color}` field in the `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}`. And this field indicates the capacity of allocated > `{color:#de350b}{{memoryBlock}}{color}` for the page. `{{{color:#de350b}ensureMemory{color}()}}` method gets called while adding rows to check if `{color:#de350b}{{totalLength + requestSize > capacity}}{color}` to allocate a new memoryBlock. If there is no room to add the next row, allocates a new block, copy the old context(prev rows) and free the old memoryBlock. > {color:#0747a6} *Problem:*{color} > While, `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}` with with datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for `{color:#de350b}{{encodedPage}}{color}`, we have not assigned the *`{color:#de350b}{{capacity}}{color}`* field with allocated memory block size. Hence, for each add row to tablePage, *ensureMemory() check always fails*, allocates a new column page memoryBlock, copy the old context(prev rows) and free the old memoryBlock. This *allocation of new memoryBlock and free of old memoryBlock happens for each row row addition* for the string columns with local dictionary. > > > > +*Issue-2:*+ > {color:#0747a6}*Context:*{color} > In`{color:#de350b}VarLengthColumnPageBase{color}`, we have a `{color:#de350b}rowOffset{color}` column page of `{color:#de350b}UnsafeFixLengthColumnPage{color}` of datatype `{color:#de350b}INT{color}` > to maintain the data offset to {color:#172b4d}each{color} row of variable length columns. This `{color:#de350b}rowOffset{color}` page allocates to be size of page. > {color:#0747a6} *Problem:*{color} > {color:#172b4d}If we have 10 rows in the page, we need 11 rows for its rowOffset page. Because we always keep 0 as offset to 1st row. So an additional row is required for rowOffset page[pasted code below to show the reference]. Otherwise, *ensureMemory() check always fails for the last row*(10th row in this case) of data and *allocates a new rowOffset page memoryBlock, copy the old context(prev rows) and free the old memoryBlock*. This *can happen for the string columns with local dictionary, direct dictionary columns, global disctionary columns*.{color} > > {code:java} > public abstract class VarLengthColumnPageBase extends ColumnPage { > ... > @Override > public void putBytes(int rowId, byte[] bytes) { > ... > if (rowId == 0) { > rowOffset.putInt(0, 0); ==> offset to 1st row is 0. > } > rowOffset.putInt(rowId + 1, rowOffset.getInt(rowId) + bytes.length); > putBytesAtRow(rowId, bytes); > totalLength += bytes.length; > } > ... > } > > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |