GitHub user xuchuanyin opened a pull request:
https://github.com/apache/carbondata/pull/2323 [CARBONDATA-2495][Doc][BloomDataMap] Add document for bloomfilter datamap add document for bloomfilter datamap Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0519_bloom_dm_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2323.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2323 ---- commit f86013276ad1e2b5c9ff7c872627a793b912dd82 Author: xuchuanyin <xuchuanyin@...> Date: 2018-05-19T14:33:43Z Add document for bloomfilter datamap add document for bloomfilter datamap ---- --- |
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2323 @jackylk @chenliang613 Can you review this? --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2323 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5003/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2323 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5980/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2323 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4822/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2323#discussion_r189477676 --- Diff: docs/datamap/bloomfilter-datamap-guide.md --- @@ -0,0 +1,94 @@ +# CarbonData BloomFilter DataMap (Alpha feature in 1.4.0) + +* [DataMap Management](#datamap-management) +* [BloomFilter Datamap Introduction](#bloomfilter-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-bloomfilter-datamap) + +#### DataMap Management +Creating BloomFilter DataMap + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING 'bloomfilter' + DMPROPERTIES ('index_columns'='city, name', 'BLOOM_SIZE'='640000', 'BLOOM_FPP'='0.00001') + ``` + +Dropping specified datamap + ``` + DROP DATAMAP [IF EXISTS] datamap_name + ON TABLE main_table + ``` + +Showing all DataMaps on this table + ``` + SHOW DATAMAP + ON TABLE main_table + ``` +It will show all DataMaps created on main table. + + +## BloomFilter DataMap Introduction +A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. +Carbondata introduce BloomFilter as an index datamap to enhance the performance of querying with precise value. +Internally, CarbonData maintains a BloomFilter per blocklet for each index column to indicate that whether a value of the column is in this blocklet. --- End diff -- Please give a hint what the suitable scenario it is to use this datamap, for example, for queries that do precise match on high cardinality column --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2323#discussion_r189791768 --- Diff: docs/datamap/bloomfilter-datamap-guide.md --- @@ -0,0 +1,94 @@ +# CarbonData BloomFilter DataMap (Alpha feature in 1.4.0) + +* [DataMap Management](#datamap-management) +* [BloomFilter Datamap Introduction](#bloomfilter-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-bloomfilter-datamap) + +#### DataMap Management +Creating BloomFilter DataMap + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING 'bloomfilter' + DMPROPERTIES ('index_columns'='city, name', 'BLOOM_SIZE'='640000', 'BLOOM_FPP'='0.00001') + ``` + +Dropping specified datamap + ``` + DROP DATAMAP [IF EXISTS] datamap_name + ON TABLE main_table + ``` + +Showing all DataMaps on this table + ``` + SHOW DATAMAP + ON TABLE main_table + ``` +It will show all DataMaps created on main table. + + +## BloomFilter DataMap Introduction +A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. +Carbondata introduce BloomFilter as an index datamap to enhance the performance of querying with precise value. +Internally, CarbonData maintains a BloomFilter per blocklet for each index column to indicate that whether a value of the column is in this blocklet. --- End diff -- OK --- |
In reply to this post by qiuchenjian-2
|
In reply to this post by qiuchenjian-2
|
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2323 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4883/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2323 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6042/ --- |
Free forum by Nabble | Edit this page |