Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Created] (CARBONDATA-3593) total_blocklets in query statistic always the same with valid_blocklets

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Created] (CARBONDATA-3593) total_blocklets in query statistic always the same with valid_blocklets

Hong Shen created CARBONDATA-3593:
-------------------------------------

Summary: total_blocklets in query statistic always the same with valid_blocklets
Key: CARBONDATA-3593
URL: https://issues.apache.org/jira/browse/CARBONDATA-3593
Project: CarbonData
Issue Type: Improvement
Components: core
Reporter: Hong Shen

When I run sql on carbondata table with "enable.query.statistics=true", total_blocklets in query statistic always the same with valid_blocklets.
```
Table test_table_hdfs_sort_city and test_table_hdfs_no_sort has the same data, the only different is test_table_hdfs_sort_city has SORT_COLUMN='city_name', while test_table_hdfs_no_sort with no sort column.

carbon.sql("select * from test_table_hdfs_sort_city where city_name='city1' ")

|scan_blocks_num|total_blocklets|valid_blocklets|total_pages|scanned_pages|valid_pages|
| 1| 1| 1 | 193| 4| 4|

carbon.sql("select * from test_table_hdfs_no_sort where city_name='city1' ")
|scan_blocks_num|total_blocklets|valid_blocklets|total_pages|scanned_pages|valid_pages|
| 1| 3| 3 | 193| 193| 193|
```

After read the code, I found both TOTAL_BLOCKLET_NUM and VALID_SCAN_BLOCKLET_NUM will plus 1 in BlockletFilterScanner.executeFilter(), BlockletFilterScanner.executeFilterForPages, BlockletFullScanner.scanBlocklet.

I think total_blocklets should be the total blocklet, valid_blocklets should be the filtered blocklet. If it need to be modified. I will provide a patch, since I have modified it locally.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)