Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2215 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4776/ --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2215#discussion_r186898007 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,213 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +1. Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME. + +2. Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars. + ```shell + mvn clean package -DskipTests -Pspark-2.2 + ``` + +3. Start spark-shell in new terminal, type :paste, then copy and run the following code. + ```scala + import java.io.File + import org.apache.spark.sql.{CarbonEnv, SparkSession} + import org.apache.spark.sql.CarbonSession._ + import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} + import org.apache.carbondata.core.util.path.CarbonStorePath + --- End diff -- Please remove the example code, because pr2268 already provided the executable example. Example code should be maintained under examples module, not inside document. --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2215#discussion_r186900225 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,213 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +1. Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME. + +2. Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars. + ```shell + mvn clean package -DskipTests -Pspark-2.2 + ``` + +3. Start spark-shell in new terminal, type :paste, then copy and run the following code. + ```scala + import java.io.File + import org.apache.spark.sql.{CarbonEnv, SparkSession} + import org.apache.spark.sql.CarbonSession._ + import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} + import org.apache.carbondata.core.util.path.CarbonStorePath + + val warehouse = new File("./warehouse").getCanonicalPath + val metastore = new File("./metastore").getCanonicalPath + + val spark = SparkSession + .builder() + .master("local") + .appName("luceneDatamapExample") + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreateCarbonSession(warehouse, metastore) + + spark.sparkContext.setLogLevel("ERROR") + + // drop table if exists previously + spark.sql(s"DROP TABLE IF EXISTS datamap_test") + + // Create main table + spark.sql( + s""" + |CREATE TABLE datamap_test ( + |name string, + |age int, + |city string, + |country string) + |STORED BY 'carbondata' + """.stripMargin) + + // Create lucene datamap on the main table + spark.sql( + s""" + |CREATE DATAMAP dm + |ON TABLE datamap_test + |USING "lucene" + |DMPROPERTIES ('TEXT_COLUMNS' = 'name,country') + """.stripMargin) + + import spark.implicits._ + import org.apache.spark.sql.SaveMode + import scala.util.Random + + // Load data to the main table, if + // lucene index writing fails, the datamap + // will be disabled in query + val r = new Random() + spark.sparkContext.parallelize(1 to 10) + .map(x => ("c1" + x % 8, x % 8, "city" + x % 50, "country" + x % 60)) + .toDF("name", "age", "city", "country") + .write + .format("carbondata") + .option("tableName", "datamap_test") + .option("compress", "true") + .mode(SaveMode.Append) + .save() + + spark.sql( + s""" + |SELECT * + |from datamap_test where + |TEXT_MATCH('name:c10') + """.stripMargin).show + + spark.sql( + s""" + |SELECT * + |from datamap_test where + |TEXT_MATCH_WITH_LIMIT('name:c10', 10) + """.stripMargin).show + + spark.stop + ``` + +#### DataMap Management +Lucene DataMap can be created using following DDL + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING "lucene" + DMPROPERTIES ('index_columns'='city, name', ...) + ``` + +DataMap can be dropped using following DDL: + ``` + DROP DATAMAP [IF EXISTS] datamap_name + ON TABLE main_table + ``` +To show all DataMaps created, use: + ``` + SHOW DATAMAP + ON TABLE main_table + ``` +It will show all DataMaps created on main table. + + +## Lucene DataMap Introduction + Lucene is a high performance, full featured text search engine. Lucene is integrated to carbon as + index datamap and managed along with main tables by CarbonData.User can create lucene datamaps + to improve query performance on string columns. + + For instance, main table called **datamap_test** which is defined as: + + ``` + CREATE TABLE datamap_test ( + name string, + age int, + city string, + country string) + STORED BY 'carbondata' + ``` + + User can create Lucene datamap using the Create DataMap DDL: + + ``` + CREATE DATAMAP dm + ON TABLE datamap_test + USING "lucene" + DMPROPERTIES ('TEXT_COLUMNS' = 'name, country') + ``` + +## Loading data +When loading data to main table, lucene index files will be generated for all the +index_columns(String Columns) given in DMProperties which contains information about the data +location of index_columns. These index files will be written inside a folder named with datamap name +inside each segment folders. + +A system level configuration carbon.lucene.compression.mode can be added for best compression of +lucene index files. The default value is speed, where the index writing speed will be more. If the +value is compression, the index file size will be compressed. + +## Querying data +As a technique for query acceleration, Lucene indexes cannot be queried directly. +Queries are to be made on main table. when a query with TEXT_MATCH('name:c10') or +TEXT_MATCH_WITH_LIMIT('name:n10',10)[the second parameter represents the number of result to be +returned, if user does not specify this value, all results will be returned without any limit] is +fired, two jobs are fired.The first job writes the temporary files in folder created at table level +which contains lucene's seach results and these files will be read in second job to give faster +results. These temporary files will be cleared once the query finishes. + +User can verify whether a query can leverage Lucene datamap or not by executing `EXPLAIN` +command, which will show the transformed logical plan, and thus user can check whether TEXT_MATCH() +filter is applied on query or not. + +Below like queries can be converted to text_match queries as following: +``` +select * from datamap_test where name='n10' + +select * from datamap_test where name like 'n1%' + +select * from datamap_test where name like '%10' --- End diff -- I tested, the result is different for the below two query, please double check: select * from datamap_test where name like '%10' select * from datamap_test where TEXT_MATCH('name:*10') --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2215#discussion_r189227705 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,213 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +1. Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME. + +2. Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars. + ```shell + mvn clean package -DskipTests -Pspark-2.2 + ``` + +3. Start spark-shell in new terminal, type :paste, then copy and run the following code. + ```scala + import java.io.File + import org.apache.spark.sql.{CarbonEnv, SparkSession} + import org.apache.spark.sql.CarbonSession._ + import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} + import org.apache.carbondata.core.util.path.CarbonStorePath + --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2215#discussion_r189228188 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,213 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +1. Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME. + +2. Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars. + ```shell + mvn clean package -DskipTests -Pspark-2.2 + ``` + +3. Start spark-shell in new terminal, type :paste, then copy and run the following code. + ```scala + import java.io.File + import org.apache.spark.sql.{CarbonEnv, SparkSession} + import org.apache.spark.sql.CarbonSession._ + import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} + import org.apache.carbondata.core.util.path.CarbonStorePath + + val warehouse = new File("./warehouse").getCanonicalPath + val metastore = new File("./metastore").getCanonicalPath + + val spark = SparkSession + .builder() + .master("local") + .appName("luceneDatamapExample") + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreateCarbonSession(warehouse, metastore) + + spark.sparkContext.setLogLevel("ERROR") + + // drop table if exists previously + spark.sql(s"DROP TABLE IF EXISTS datamap_test") + + // Create main table + spark.sql( + s""" + |CREATE TABLE datamap_test ( + |name string, + |age int, + |city string, + |country string) + |STORED BY 'carbondata' + """.stripMargin) + + // Create lucene datamap on the main table + spark.sql( + s""" + |CREATE DATAMAP dm + |ON TABLE datamap_test + |USING "lucene" + |DMPROPERTIES ('TEXT_COLUMNS' = 'name,country') + """.stripMargin) + + import spark.implicits._ + import org.apache.spark.sql.SaveMode + import scala.util.Random + + // Load data to the main table, if + // lucene index writing fails, the datamap + // will be disabled in query + val r = new Random() + spark.sparkContext.parallelize(1 to 10) + .map(x => ("c1" + x % 8, x % 8, "city" + x % 50, "country" + x % 60)) + .toDF("name", "age", "city", "country") + .write + .format("carbondata") + .option("tableName", "datamap_test") + .option("compress", "true") + .mode(SaveMode.Append) + .save() + + spark.sql( + s""" + |SELECT * + |from datamap_test where + |TEXT_MATCH('name:c10') + """.stripMargin).show + + spark.sql( + s""" + |SELECT * + |from datamap_test where + |TEXT_MATCH_WITH_LIMIT('name:c10', 10) + """.stripMargin).show + + spark.stop + ``` + +#### DataMap Management +Lucene DataMap can be created using following DDL + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING "lucene" + DMPROPERTIES ('index_columns'='city, name', ...) + ``` + +DataMap can be dropped using following DDL: + ``` + DROP DATAMAP [IF EXISTS] datamap_name + ON TABLE main_table + ``` +To show all DataMaps created, use: + ``` + SHOW DATAMAP + ON TABLE main_table + ``` +It will show all DataMaps created on main table. + + +## Lucene DataMap Introduction + Lucene is a high performance, full featured text search engine. Lucene is integrated to carbon as + index datamap and managed along with main tables by CarbonData.User can create lucene datamaps + to improve query performance on string columns. + + For instance, main table called **datamap_test** which is defined as: + + ``` + CREATE TABLE datamap_test ( + name string, + age int, + city string, + country string) + STORED BY 'carbondata' + ``` + + User can create Lucene datamap using the Create DataMap DDL: + + ``` + CREATE DATAMAP dm + ON TABLE datamap_test + USING "lucene" + DMPROPERTIES ('TEXT_COLUMNS' = 'name, country') + ``` + +## Loading data +When loading data to main table, lucene index files will be generated for all the +index_columns(String Columns) given in DMProperties which contains information about the data +location of index_columns. These index files will be written inside a folder named with datamap name +inside each segment folders. + +A system level configuration carbon.lucene.compression.mode can be added for best compression of +lucene index files. The default value is speed, where the index writing speed will be more. If the +value is compression, the index file size will be compressed. + +## Querying data +As a technique for query acceleration, Lucene indexes cannot be queried directly. +Queries are to be made on main table. when a query with TEXT_MATCH('name:c10') or +TEXT_MATCH_WITH_LIMIT('name:n10',10)[the second parameter represents the number of result to be +returned, if user does not specify this value, all results will be returned without any limit] is +fired, two jobs are fired.The first job writes the temporary files in folder created at table level +which contains lucene's seach results and these files will be read in second job to give faster +results. These temporary files will be cleared once the query finishes. + +User can verify whether a query can leverage Lucene datamap or not by executing `EXPLAIN` +command, which will show the transformed logical plan, and thus user can check whether TEXT_MATCH() +filter is applied on query or not. + +Below like queries can be converted to text_match queries as following: +``` +select * from datamap_test where name='n10' + +select * from datamap_test where name like 'n1%' + +select * from datamap_test where name like '%10' --- End diff -- i have tested this, we have a UT also for this, it is working fine, and we cannot compare all the like query results with text_match, as lucene search way is different compare to like query search --- |
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2215#discussion_r189232106 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,133 @@ +# CarbonData Lucene DataMap --- End diff -- Please mark Lucene feature as Alpha for 1.4.0 --- |
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on the issue:
https://github.com/apache/carbondata/pull/2215 LGTM --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2215 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4806/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2215 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4809/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2215 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4989/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2215 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5964/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2215 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4990/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2215 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5967/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2215 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4993/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2215#discussion_r189421502 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,133 @@ +# CarbonData Lucene DataMap (Alpha feature in 1.4.0) + +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +#### DataMap Management +Lucene DataMap can be created using following DDL + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING "lucene" + DMPROPERTIES ('index_columns'='city, name', ...) + ``` + +DataMap can be dropped using following DDL: + ``` + DROP DATAMAP [IF EXISTS] datamap_name + ON TABLE main_table + ``` +To show all DataMaps created, use: + ``` + SHOW DATAMAP + ON TABLE main_table + ``` +It will show all DataMaps created on main table. + + +## Lucene DataMap Introduction + Lucene is a high performance, full featured text search engine. Lucene is integrated to carbon as + an index datamap and managed along with main tables by CarbonData.User can create lucene datamap + to improve query performance on string columns which has content of more length. + + For instance, main table called **datamap_test** which is defined as: + + ``` + CREATE TABLE datamap_test ( + name string, + age int, + city string, + country string) + STORED BY 'carbondata' + ``` + + User can create Lucene datamap using the Create DataMap DDL: + + ``` + CREATE DATAMAP dm + ON TABLE datamap_test + USING "lucene" + DMPROPERTIES ('INDEX_COLUMNS' = 'name, country') + ``` + +## Loading data +When loading data to main table, lucene index files will be generated for all the +index_columns(String Columns) given in DMProperties which contains information about the data +location of index_columns. These index files will be written inside a folder named with datamap name +inside each segment folders. + +A system level configuration carbon.lucene.compression.mode can be added for best compression of +lucene index files. The default value is speed, where the index writing speed will be more. If the +value is compression, the index file size will be compressed. + +## Querying data +As a technique for query acceleration, Lucene indexes cannot be queried directly. +Queries are to be made on main table. when a query with TEXT_MATCH('name:c10') or +TEXT_MATCH_WITH_LIMIT('name:n10',10)[the second parameter represents the number of result to be +returned, if user does not specify this value, all results will be returned without any limit] is +fired, two jobs are fired.The first job writes the temporary files in folder created at table level +which contains lucene's seach results and these files will be read in second job to give faster +results. These temporary files will be cleared once the query finishes. + +User can verify whether a query can leverage Lucene datamap or not by executing `EXPLAIN` +command, which will show the transformed logical plan, and thus user can check whether TEXT_MATCH() +filter is applied on query or not. + +Note: The filter columns in TEXT_MATCH or TEXT_MATCH_WITH_LIMIT must be always in lower case and +filter condition like 'AND','OR' must be in upper case. + --- End diff -- Add a limitation description here: In this version, we support one TEXT_MATCH UDF for one relation only and user should put AND/OR logic inside this UDF, instead of writing separate UDF. For example `select * from T where TEXT_MATCH('col1:a AND col2:b')` is supported `select * from T where TEXT_MATCH('col1:a') and TEXT_MATCH('col2:b')` is not supported --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2215#discussion_r189421589 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,133 @@ +# CarbonData Lucene DataMap (Alpha feature in 1.4.0) + +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) --- End diff -- Can you make a section to describe `REBUILD DATAMAP` and `WITH DEFERRED REBUILD` feature when creating datamap --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2215#discussion_r189421639 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,133 @@ +# CarbonData Lucene DataMap (Alpha feature in 1.4.0) + +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +#### DataMap Management +Lucene DataMap can be created using following DDL + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING "lucene" + DMPROPERTIES ('index_columns'='city, name', ...) + ``` + +DataMap can be dropped using following DDL: + ``` + DROP DATAMAP [IF EXISTS] datamap_name + ON TABLE main_table + ``` +To show all DataMaps created, use: + ``` + SHOW DATAMAP + ON TABLE main_table + ``` +It will show all DataMaps created on main table. + + +## Lucene DataMap Introduction + Lucene is a high performance, full featured text search engine. Lucene is integrated to carbon as + an index datamap and managed along with main tables by CarbonData.User can create lucene datamap + to improve query performance on string columns which has content of more length. + + For instance, main table called **datamap_test** which is defined as: + + ``` + CREATE TABLE datamap_test ( + name string, + age int, + city string, + country string) + STORED BY 'carbondata' + ``` + + User can create Lucene datamap using the Create DataMap DDL: + + ``` + CREATE DATAMAP dm + ON TABLE datamap_test + USING "lucene" + DMPROPERTIES ('INDEX_COLUMNS' = 'name, country') + ``` + +## Loading data +When loading data to main table, lucene index files will be generated for all the +index_columns(String Columns) given in DMProperties which contains information about the data +location of index_columns. These index files will be written inside a folder named with datamap name +inside each segment folders. + +A system level configuration carbon.lucene.compression.mode can be added for best compression of +lucene index files. The default value is speed, where the index writing speed will be more. If the +value is compression, the index file size will be compressed. + +## Querying data +As a technique for query acceleration, Lucene indexes cannot be queried directly. +Queries are to be made on main table. when a query with TEXT_MATCH('name:c10') or +TEXT_MATCH_WITH_LIMIT('name:n10',10)[the second parameter represents the number of result to be +returned, if user does not specify this value, all results will be returned without any limit] is +fired, two jobs are fired.The first job writes the temporary files in folder created at table level +which contains lucene's seach results and these files will be read in second job to give faster +results. These temporary files will be cleared once the query finishes. + +User can verify whether a query can leverage Lucene datamap or not by executing `EXPLAIN` +command, which will show the transformed logical plan, and thus user can check whether TEXT_MATCH() +filter is applied on query or not. + +Note: The filter columns in TEXT_MATCH or TEXT_MATCH_WITH_LIMIT must be always in lower case and +filter condition like 'AND','OR' must be in upper case. + +Ex: ``` + select * from datamap_test where TEXT_MATCH('name:*10 AND name:*n*') + ``` + +Below like queries can be converted to text_match queries as following: +``` +select * from datamap_test where name='n10' + +select * from datamap_test where name like 'n1%' + +select * from datamap_test where name like '%10' + +select * from datamap_test where name like '%n%' + +select * from datamap_test where name like '%10' and name not like '%n%' +``` +Lucene TEXT_MATCH Queries: +``` +select * from datamap_test where TEXT_MATCH('name:n10') + +select * from datamap_test where TEXT_MATCH('name:n1*') + +select * from datamap_test where TEXT_MATCH('name:*10') + +select * from datamap_test where TEXT_MATCH('name:*n*') + +select * from datamap_test where TEXT_MATCH('name:*10 -name:*n*') --- End diff -- For all these queries, please describe what is the effect of it, since user maybe not familiar with lucene syntax. And provide a link for user to refer to lucene syntax --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2215#discussion_r189421681 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,133 @@ +# CarbonData Lucene DataMap (Alpha feature in 1.4.0) + +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +#### DataMap Management +Lucene DataMap can be created using following DDL + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING "lucene" + DMPROPERTIES ('index_columns'='city, name', ...) + ``` + +DataMap can be dropped using following DDL: + ``` + DROP DATAMAP [IF EXISTS] datamap_name + ON TABLE main_table + ``` +To show all DataMaps created, use: + ``` + SHOW DATAMAP + ON TABLE main_table + ``` +It will show all DataMaps created on main table. + + +## Lucene DataMap Introduction + Lucene is a high performance, full featured text search engine. Lucene is integrated to carbon as + an index datamap and managed along with main tables by CarbonData.User can create lucene datamap + to improve query performance on string columns which has content of more length. + + For instance, main table called **datamap_test** which is defined as: + + ``` + CREATE TABLE datamap_test ( + name string, + age int, + city string, + country string) + STORED BY 'carbondata' + ``` + + User can create Lucene datamap using the Create DataMap DDL: + + ``` + CREATE DATAMAP dm + ON TABLE datamap_test + USING "lucene" + DMPROPERTIES ('INDEX_COLUMNS' = 'name, country') + ``` + --- End diff -- There is more DMPROPERTY introduced in PR2275, please add --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2215#discussion_r189421956 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,133 @@ +# CarbonData Lucene DataMap (Alpha feature in 1.4.0) + +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +#### DataMap Management +Lucene DataMap can be created using following DDL + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING "lucene" + DMPROPERTIES ('index_columns'='city, name', ...) + ``` + +DataMap can be dropped using following DDL: + ``` + DROP DATAMAP [IF EXISTS] datamap_name + ON TABLE main_table + ``` +To show all DataMaps created, use: + ``` + SHOW DATAMAP + ON TABLE main_table + ``` +It will show all DataMaps created on main table. + + +## Lucene DataMap Introduction + Lucene is a high performance, full featured text search engine. Lucene is integrated to carbon as + an index datamap and managed along with main tables by CarbonData.User can create lucene datamap + to improve query performance on string columns which has content of more length. --- End diff -- Please rephrase to describe: this datamap is intended for text content, and you want to search the tokenized word or pattern of it. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2215#discussion_r189422398 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,133 @@ +# CarbonData Lucene DataMap (Alpha feature in 1.4.0) + +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +#### DataMap Management +Lucene DataMap can be created using following DDL + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING "lucene" --- End diff -- Is it `"` or `'` ? --- |
Free forum by Nabble | Edit this page |