[jira] [Commented] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.

Akash R Nilugal (Jira)

    [ https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213673#comment-17213673 ]

Prasanna Ravichandran commented on CARBONDATA-3807:
---------------------------------------------------

Model plan with bloom details: (Could not attach the screenshot)

== CarbonData Profiler ==
Table Scan on uniqdata
 - total: 2 blocks, 2 blocklets
 - filter: (cust_name <> null and cust_name = CUST_NAME_00000)
 - pruned by Main Index
 - skipped: 0 blocks, 0 blocklets
 *- pruned by CG Index*
 *- name: datamapuniq_b1*
 *- provider: bloomfilter*
 - skipped: 0 blocks, 0 blocklets

== Physical Plan ==
AdaptiveSparkPlan(isFinalPlan=false)
+- HashAggregate(keys=[], functions=[count(1)])
 +- Exchange SinglePartition, true, [id=#129]
 +- HashAggregate(keys=[], functions=[partial_count(1)])
 +- Project
 +- Scan carbondata default.uniqdata[] PushedFilters: [IsNotNull(cust_name), EqualTo(cust_name,CUST_NAME_00000)], ReadSchema: struct<cust_name:string>

> Filter queries and projection queries with bloom columns are not hitting the bloom datamap.
> -------------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-3807
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3807
>             Project: CarbonData
>          Issue Type: Bug
>         Environment: Ant cluster - opensource
>            Reporter: Prasanna Ravichandran
>            Priority: Major
>             Fix For: 2.0.0
>
>         Attachments: bloom-filtercolumn-plan.png, bloom-show index.png
>
>
> Filter queries and projection queries with bloom columns are not hitting the bloom datamap.
>  Bloom datamap is unused as per plan, even though created.
> Test queries: 
> drop table if exists uniqdata;
>  CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata;
>  load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' PROPERTIES ('BLOOM_SIZE'='640000', 'BLOOM_FPP'='0.00001');
> show indexes on uniqdata;
> explain select count(*) from uniqdata where cust_name="CUST_NAME_00000"; --not hitting;
> explain select cust_name from uniqdata; --not hitting;
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)