[GitHub] carbondata pull request #2215: [wip]add documentation for lucene datamap

classic Classic list List threaded Threaded
70 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2215: [CARBONDATA-2206]add documentation for lucene...

qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2215#discussion_r189435741
 
    --- Diff: docs/datamap/lucene-datamap-guide.md ---
    @@ -0,0 +1,133 @@
    +# CarbonData Lucene DataMap (Alpha feature in 1.4.0)
    +  
    +* [DataMap Management](#datamap-management)
    +* [Lucene Datamap](#lucene-datamap-introduction)
    +* [Loading Data](#loading-data)
    +* [Querying Data](#querying-data)
    +* [Data Management](#data-management-with-pre-aggregate-tables)
    --- End diff --
   
    It's incorrect here:
    `data-management-with-pre-aggregate-tables`
    It should be
    `data-management-with-lucene-datamap`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2215: [CARBONDATA-2206]add documentation for lucene...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2215#discussion_r189435815
 
    --- Diff: docs/datamap/lucene-datamap-guide.md ---
    @@ -0,0 +1,133 @@
    +# CarbonData Lucene DataMap (Alpha feature in 1.4.0)
    +  
    +* [DataMap Management](#datamap-management)
    +* [Lucene Datamap](#lucene-datamap-introduction)
    +* [Loading Data](#loading-data)
    +* [Querying Data](#querying-data)
    +* [Data Management](#data-management-with-pre-aggregate-tables)
    --- End diff --
   
    @jackylk I think it's better to add another document to describe the common operations for index datamap, since the descriptions for `Data Management`, `REBUILD DATAMAP`, `WITH DEFERRED REBUILD` are the same for `BloomFilterDataMap` and `LuceneDataMap`.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2215: [CARBONDATA-2206]add documentation for lucene...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2215#discussion_r189507570
 
    --- Diff: docs/datamap/lucene-datamap-guide.md ---
    @@ -0,0 +1,133 @@
    +# CarbonData Lucene DataMap (Alpha feature in 1.4.0)
    +  
    +* [DataMap Management](#datamap-management)
    +* [Lucene Datamap](#lucene-datamap-introduction)
    +* [Loading Data](#loading-data)
    +* [Querying Data](#querying-data)
    +* [Data Management](#data-management-with-pre-aggregate-tables)
    +
    +#### DataMap Management
    +Lucene DataMap can be created using following DDL
    +  ```
    +  CREATE DATAMAP [IF NOT EXISTS] datamap_name
    +  ON TABLE main_table
    +  USING "lucene"
    +  DMPROPERTIES ('index_columns'='city, name', ...)
    +  ```
    +
    +DataMap can be dropped using following DDL:
    +  ```
    +  DROP DATAMAP [IF EXISTS] datamap_name
    +  ON TABLE main_table
    +  ```
    +To show all DataMaps created, use:
    +  ```
    +  SHOW DATAMAP
    +  ON TABLE main_table
    +  ```
    +It will show all DataMaps created on main table.
    +
    +
    +## Lucene DataMap Introduction
    +  Lucene is a high performance, full featured text search engine. Lucene is integrated to carbon as
    +  an index datamap and managed along with main tables by CarbonData.User can create lucene datamap
    +  to improve query performance on string columns which has content of more length.
    +  
    +  For instance, main table called **datamap_test** which is defined as:
    +  
    +  ```
    +  CREATE TABLE datamap_test (
    +    name string,
    +    age int,
    +    city string,
    +    country string)
    +  STORED BY 'carbondata'
    +  ```
    +  
    +  User can create Lucene datamap using the Create DataMap DDL:
    +  
    +  ```
    +  CREATE DATAMAP dm
    +  ON TABLE datamap_test
    +  USING "lucene"
    +  DMPROPERTIES ('INDEX_COLUMNS' = 'name, country')
    +  ```
    +
    +## Loading data
    +When loading data to main table, lucene index files will be generated for all the
    +index_columns(String Columns) given in DMProperties which contains information about the data
    +location of index_columns. These index files will be written inside a folder named with datamap name
    +inside each segment folders.
    +
    +A system level configuration carbon.lucene.compression.mode can be added for best compression of
    +lucene index files. The default value is speed, where the index writing speed will be more. If the
    +value is compression, the index file size will be compressed.
    +
    +## Querying data
    +As a technique for query acceleration, Lucene indexes cannot be queried directly.
    +Queries are to be made on main table. when a query with TEXT_MATCH('name:c10') or
    +TEXT_MATCH_WITH_LIMIT('name:n10',10)[the second parameter represents the number of result to be
    +returned, if user does not specify this value, all results will be returned without any limit] is
    +fired, two jobs are fired.The first job writes the temporary files in folder created at table level
    +which contains lucene's seach results and these files will be read in second job to give faster
    +results. These temporary files will be cleared once the query finishes.
    +
    +User can verify whether a query can leverage Lucene datamap or not by executing `EXPLAIN`
    +command, which will show the transformed logical plan, and thus user can check whether TEXT_MATCH()
    +filter is applied on query or not.
    +
    +Note: The filter columns in TEXT_MATCH or TEXT_MATCH_WITH_LIMIT must be always in lower case and
    +filter condition like 'AND','OR' must be in upper case.
    +
    +Ex:  ```
    +     select * from datamap_test where TEXT_MATCH('name:*10 AND name:*n*')
    +     ```
    +
    +Below like queries can be converted to text_match queries as following:
    +```
    +select * from datamap_test where name='n10'
    +
    +select * from datamap_test where name like 'n1%'
    +
    +select * from datamap_test where name like '%10'
    +
    +select * from datamap_test where name like '%n%'
    +
    +select * from datamap_test where name like '%10' and name not like '%n%'
    +```
    +Lucene TEXT_MATCH Queries:
    +```
    +select * from datamap_test where TEXT_MATCH('name:n10')
    +
    +select * from datamap_test where TEXT_MATCH('name:n1*')
    +
    +select * from datamap_test where TEXT_MATCH('name:*10')
    +
    +select * from datamap_test where TEXT_MATCH('name:*n*')
    +
    +select * from datamap_test where TEXT_MATCH('name:*10 -name:*n*')
    --- End diff --
   
    added a link, which will provide details of all these queries


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2215: [CARBONDATA-2206]add documentation for lucene...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2215#discussion_r189507649
 
    --- Diff: docs/datamap/lucene-datamap-guide.md ---
    @@ -0,0 +1,133 @@
    +# CarbonData Lucene DataMap (Alpha feature in 1.4.0)
    +  
    +* [DataMap Management](#datamap-management)
    +* [Lucene Datamap](#lucene-datamap-introduction)
    +* [Loading Data](#loading-data)
    +* [Querying Data](#querying-data)
    +* [Data Management](#data-management-with-pre-aggregate-tables)
    --- End diff --
   
    yes, i think the same, and about refresh im also not sure about how it works, so this PR will be specific to lucene,


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2215: [CARBONDATA-2206]add documentation for lucene datama...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on the issue:

    https://github.com/apache/carbondata/pull/2215
 
    @xuchuanyin and @jackylk please review


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2215: [CARBONDATA-2206]add documentation for lucene datama...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on the issue:

    https://github.com/apache/carbondata/pull/2215
 
    @chenliang613 please review and merge


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2215: [CARBONDATA-2206]add documentation for lucene datama...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/2215
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2215: [CARBONDATA-2206]add documentation for lucene...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2215


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2215: [CARBONDATA-2206]add documentation for lucene datama...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2215
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6005/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2215: [CARBONDATA-2206]add documentation for lucene datama...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2215
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4846/



---
1234