[ https://issues.apache.org/jira/browse/CARBONDATA-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Shen updated CARBONDATA-3593: ---------------------------------- Description: When I run sql on carbondata table with "enable.query.statistics=true", total_blocklets in query statistic always the same with valid_blocklets. Below is an example. Table test_table_hdfs_sort_city and test_table_hdfs_no_sort has the same data, the only different is test_table_hdfs_sort_city has SORT_COLUMN='city_name', while test_table_hdfs_no_sort with no sort column. {code} carbon.sql("select * from test_table_hdfs_sort_city where city_name='city1' ") {code} |scan_blocks_num|total_blocklets|valid_blocklets|total_pages|scanned_pages|valid_pages| | 1| 1| 1 | 193| 4| 4| {code} carbon.sql("select * from test_table_hdfs_no_sort where city_name='city1' ") {code} |scan_blocks_num|total_blocklets|valid_blocklets|total_pages|scanned_pages|valid_pages| | 1| 3| 3 | 193| 193| 193| After read the code, I found both TOTAL_BLOCKLET_NUM and VALID_SCAN_BLOCKLET_NUM will plus 1 in BlockletFilterScanner.executeFilter(), BlockletFilterScanner.executeFilterForPages, BlockletFullScanner.scanBlocklet. I think total_blocklets should be the total blocklet, valid_blocklets should be the filtered blocklet. If it need to be modified. I will provide a patch, since I have modified it locally. was: When I run sql on carbondata table with "enable.query.statistics=true", total_blocklets in query statistic always the same with valid_blocklets. {code} Table test_table_hdfs_sort_city and test_table_hdfs_no_sort has the same data, the only different is test_table_hdfs_sort_city has SORT_COLUMN='city_name', while test_table_hdfs_no_sort with no sort column. carbon.sql("select * from test_table_hdfs_sort_city where city_name='city1' ") |scan_blocks_num|total_blocklets|valid_blocklets|total_pages|scanned_pages|valid_pages| | 1| 1| 1 | 193| 4| 4| carbon.sql("select * from test_table_hdfs_no_sort where city_name='city1' ") |scan_blocks_num|total_blocklets|valid_blocklets|total_pages|scanned_pages|valid_pages| | 1| 3| 3 | 193| 193| 193| {code} After read the code, I found both TOTAL_BLOCKLET_NUM and VALID_SCAN_BLOCKLET_NUM will plus 1 in BlockletFilterScanner.executeFilter(), BlockletFilterScanner.executeFilterForPages, BlockletFullScanner.scanBlocklet. I think total_blocklets should be the total blocklet, valid_blocklets should be the filtered blocklet. If it need to be modified. I will provide a patch, since I have modified it locally. > total_blocklets in query statistic always the same with valid_blocklets > ----------------------------------------------------------------------- > > Key: CARBONDATA-3593 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3593 > Project: CarbonData > Issue Type: Improvement > Components: core > Reporter: Hong Shen > Priority: Major > > When I run sql on carbondata table with "enable.query.statistics=true", total_blocklets in query statistic always the same with valid_blocklets. Below is an example. > Table test_table_hdfs_sort_city and test_table_hdfs_no_sort has the same data, the only different is test_table_hdfs_sort_city has SORT_COLUMN='city_name', while test_table_hdfs_no_sort with no sort column. > {code} > carbon.sql("select * from test_table_hdfs_sort_city where city_name='city1' ") > {code} > |scan_blocks_num|total_blocklets|valid_blocklets|total_pages|scanned_pages|valid_pages| > | 1| 1| 1 | 193| 4| 4| > {code} > carbon.sql("select * from test_table_hdfs_no_sort where city_name='city1' ") > {code} > |scan_blocks_num|total_blocklets|valid_blocklets|total_pages|scanned_pages|valid_pages| > | 1| 3| 3 | 193| 193| 193| > After read the code, I found both TOTAL_BLOCKLET_NUM and VALID_SCAN_BLOCKLET_NUM will plus 1 in BlockletFilterScanner.executeFilter(), BlockletFilterScanner.executeFilterForPages, BlockletFullScanner.scanBlocklet. > I think total_blocklets should be the total blocklet, valid_blocklets should be the filtered blocklet. If it need to be modified. I will provide a patch, since I have modified it locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |