Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Commented] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Commented] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.

[ https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213673#comment-17213673 ]

Prasanna Ravichandran commented on CARBONDATA-3807:
---------------------------------------------------

Model plan with bloom details: (Could not attach the screenshot)

== CarbonData Profiler ==
Table Scan on uniqdata
- total: 2 blocks, 2 blocklets
- filter: (cust_name <> null and cust_name = CUST_NAME_00000)
- pruned by Main Index
- skipped: 0 blocks, 0 blocklets
*- pruned by CG Index*
*- name: datamapuniq_b1*
*- provider: bloomfilter*
- skipped: 0 blocks, 0 blocklets

== Physical Plan ==
AdaptiveSparkPlan(isFinalPlan=false)
+- HashAggregate(keys=[], functions=[count(1)])
+- Exchange SinglePartition, true, [id=#129]
+- HashAggregate(keys=[], functions=[partial_count(1)])
+- Project
+- Scan carbondata default.uniqdata[] PushedFilters: [IsNotNull(cust_name), EqualTo(cust_name,CUST_NAME_00000)], ReadSchema: struct<cust_name:string>

> Filter queries and projection queries with bloom columns are not hitting the bloom datamap.
> -------------------------------------------------------------------------------------------
>
> Key: CARBONDATA-3807
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3807
> Project: CarbonData
> Issue Type: Bug
> Environment: Ant cluster - opensource
> Reporter: Prasanna Ravichandran
> Priority: Major
> Fix For: 2.0.0
>
> Attachments: bloom-filtercolumn-plan.png, bloom-show index.png
>
>
> Filter queries and projection queries with bloom columns are not hitting the bloom datamap.
> Bloom datamap is unused as per plan, even though created.
> Test queries:
> drop table if exists uniqdata;
> CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata;
> load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' PROPERTIES ('BLOOM_SIZE'='640000', 'BLOOM_FPP'='0.00001');
> show indexes on uniqdata;
> explain select count(*) from uniqdata where cust_name="CUST_NAME_00000"; --not hitting;
> explain select cust_name from uniqdata; --not hitting;
>
>

--
This message was sent by Atlassian Jira
(v8.3.4#803005)