[jira] [Updated] (CARBONDATA-3519) A new column page MemoryBlock is allocated at each row addition to table page if having string column with local dictionary enabled.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-3519) A new column page MemoryBlock is allocated at each row addition to table page if having string column with local dictionary enabled.

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Venugopal Reddy K updated CARBONDATA-3519:
------------------------------------------
    Description:
 +*Issue-1:*+

{color:#0747a6}*Context:*{color}

For a string column with local dictionary enabled, a column page of

`{color:#de350b}UnsafeFixLengthColumnPage{color}` with datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for `{color:#de350b}{{encodedPage}}{color}` along with regular `{color:#de350b}{{actualPage}}{color}` of `{color:#de350b}{{UnsafeVarLengthColumnPage}}{color}`. 

We have `{color:#de350b}*{{capacity}}*{color}` field in the `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}`. And this field indicates the capacity of  allocated

`{color:#de350b}{{memoryBlock}}{color}` for the page. `{{{color:#de350b}ensureMemory{color}()}}` method gets called while adding rows to check if  `{color:#de350b}{{totalLength + requestSize > capacity}}{color}` to allocate a new memoryBlock. If there is no room to add the next row, allocates a new block, copy the old context(prev rows) and free the old memoryBlock.

{color:#0747a6} *Problem:*{color}

While, `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}` with with datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for `{color:#de350b}{{encodedPage}}{color}`, we have not assigned the *`{color:#de350b}{{capacity}}{color}`* field with allocated memory block size. Hence, for each add row to tablePage, *ensureMemory() check always fails*, allocates a new column page memoryBlock, copy the old context(prev rows) and free the old memoryBlock. This *allocation of new memoryBlock and free of old memoryBlock happens for each row row addition* for the string columns with local dictionary.

 

+*Issue-2:*+

{color:#0747a6}*Context:*{color}

In`{color:#de350b}VarLengthColumnPageBase{color}`, we have a `{color:#de350b}rowOffset{color}` column page of  `{color:#de350b}UnsafeFixLengthColumnPage{color}` of datatype `{color:#de350b}INT{color}`

to maintain the data offset to {color:#172b4d}each{color} row of variable length columns. This `{color:#de350b}rowOffset{color}` page allocates to be size of page. 

{color:#0747a6} *Problem:*{color}

{color:#172b4d}If we have 10 rows in the page, we need 11 rows for its rowOffset page. Because we always keep 0 as offset to 1st row. So an additional row is required for rowOffset page[pasted code below to show the reference]. Otherwise, *ensureMemory() check always fails for the last row*(10th row in this case) of data and *allocates a new rowOffset page memoryBlock, copy the old context(prev rows) and free the old memoryBlock*. This *can happen for the string columns with local dictionary, direct dictionary columns, global disctionary columns*.{color}

 
{code:java}
public abstract class VarLengthColumnPageBase extends ColumnPage {
...
@Override
public void putBytes(int rowId, byte[] bytes) {
 ...
 if (rowId == 0) {
 rowOffset.putInt(0, 0); ==> offset to 1st row is 0.
 }
 rowOffset.putInt(rowId + 1, rowOffset.getInt(rowId) + bytes.length);
 putBytesAtRow(rowId, bytes);
 totalLength += bytes.length;
}
...
}
 
{code}
 

  was:
 
{code:java}
 {code}
*Issue:1*

{color:#0747a6}*Context:*{color}

For a string column with local dictionary enabled, a column page of

`{color:#de350b}UnsafeFixLengthColumnPage{color}` with datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for `{color:#de350b}{{encodedPage}}{color}` along with regular `{color:#de350b}{{actualPage}}{color}` of `{color:#de350b}{{UnsafeVarLengthColumnPage}}{color}`. 

We have `{color:#de350b}*{{capacity}}*{color}` field in the `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}`. And this field indicates the capacity of  allocated

`{color:#de350b}{{memoryBlock}}{color}` for the page. `{{{color:#de350b}ensureMemory{color}()}}` method gets called while adding rows to check if  `{color:#de350b}{{totalLength + requestSize > capacity}}{color}` to allocate a new memoryBlock. If there is no room to add the next row, allocates a new block, copy the old context(prev rows) and free the old memoryBlock.

{color:#0747a6} *Problem:*{color}

While, `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}` with with datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for `{color:#de350b}{{encodedPage}}{color}`, we have not assigned the *`{color:#de350b}{{capacity}}{color}`* field with allocated memory block size. Hence, for each add row to tablePage, *ensureMemory() check always fails*, allocates a new column page memoryBlock, copy the old context(prev rows) and free the old memoryBlock. This *allocation of new memoryBlock and free of old memoryBlock happens for each row row addition* for the string columns with local dictionary.

 

+*Issue-2:*+

{color:#0747a6}*Context:*{color}

In`{color:#de350b}VarLengthColumnPageBase{color}`, we have a `{color:#de350b}rowOffset{color}` column page of  `{color:#de350b}UnsafeFixLengthColumnPage{color}` of datatype `{color:#de350b}INT{color}`

to maintain the data offset to {color:#172b4d}each{color} row of variable length columns. This `{color:#de350b}rowOffset{color}` page allocates to be size of page. 

{color:#0747a6} *Problem:*{color}

{color:#172b4d}If we have 10 rows in the page, we need 11 rows for its rowOffset page. Because we always keep 0 as offset to 1st row. So an additional row is required for rowOffset page[pasted code below to show the reference]. Otherwise, *ensureMemory() check always fails for the last row*(10th row in this case) of data and *allocates a new rowOffset page memoryBlock, copy the old context(prev rows) and free the old memoryBlock*. This *can happen for the string columns with local dictionary, direct dictionary columns, global disctionary columns*.{color}

 
{code:java}
public abstract class VarLengthColumnPageBase extends ColumnPage {
...
@Override
public void putBytes(int rowId, byte[] bytes) {
 ...
 if (rowId == 0) {
 rowOffset.putInt(0, 0); ==> offset to 1st row is 0.
 }
 rowOffset.putInt(rowId + 1, rowOffset.getInt(rowId) + bytes.length);
 putBytesAtRow(rowId, bytes);
 totalLength += bytes.length;
}
...
}
 
{code}
 


> A new column page MemoryBlock is allocated at each row addition to table page if having string column with local dictionary enabled.
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-3519
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3519
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: core
>            Reporter: Venugopal Reddy K
>            Priority: Minor
>
>  +*Issue-1:*+
> {color:#0747a6}*Context:*{color}
> For a string column with local dictionary enabled, a column page of
> `{color:#de350b}UnsafeFixLengthColumnPage{color}` with datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for `{color:#de350b}{{encodedPage}}{color}` along with regular `{color:#de350b}{{actualPage}}{color}` of `{color:#de350b}{{UnsafeVarLengthColumnPage}}{color}`. 
> We have `{color:#de350b}*{{capacity}}*{color}` field in the `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}`. And this field indicates the capacity of  allocated
> `{color:#de350b}{{memoryBlock}}{color}` for the page. `{{{color:#de350b}ensureMemory{color}()}}` method gets called while adding rows to check if  `{color:#de350b}{{totalLength + requestSize > capacity}}{color}` to allocate a new memoryBlock. If there is no room to add the next row, allocates a new block, copy the old context(prev rows) and free the old memoryBlock.
> {color:#0747a6} *Problem:*{color}
> While, `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}` with with datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for `{color:#de350b}{{encodedPage}}{color}`, we have not assigned the *`{color:#de350b}{{capacity}}{color}`* field with allocated memory block size. Hence, for each add row to tablePage, *ensureMemory() check always fails*, allocates a new column page memoryBlock, copy the old context(prev rows) and free the old memoryBlock. This *allocation of new memoryBlock and free of old memoryBlock happens for each row row addition* for the string columns with local dictionary.
>  
> +*Issue-2:*+
> {color:#0747a6}*Context:*{color}
> In`{color:#de350b}VarLengthColumnPageBase{color}`, we have a `{color:#de350b}rowOffset{color}` column page of  `{color:#de350b}UnsafeFixLengthColumnPage{color}` of datatype `{color:#de350b}INT{color}`
> to maintain the data offset to {color:#172b4d}each{color} row of variable length columns. This `{color:#de350b}rowOffset{color}` page allocates to be size of page. 
> {color:#0747a6} *Problem:*{color}
> {color:#172b4d}If we have 10 rows in the page, we need 11 rows for its rowOffset page. Because we always keep 0 as offset to 1st row. So an additional row is required for rowOffset page[pasted code below to show the reference]. Otherwise, *ensureMemory() check always fails for the last row*(10th row in this case) of data and *allocates a new rowOffset page memoryBlock, copy the old context(prev rows) and free the old memoryBlock*. This *can happen for the string columns with local dictionary, direct dictionary columns, global disctionary columns*.{color}
>  
> {code:java}
> public abstract class VarLengthColumnPageBase extends ColumnPage {
> ...
> @Override
> public void putBytes(int rowId, byte[] bytes) {
>  ...
>  if (rowId == 0) {
>  rowOffset.putInt(0, 0); ==> offset to 1st row is 0.
>  }
>  rowOffset.putInt(rowId + 1, rowOffset.getInt(rowId) + bytes.length);
>  putBytesAtRow(rowId, bytes);
>  totalLength += bytes.length;
> }
> ...
> }
>  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)