GitHub user chenliang613 opened a pull request:
https://github.com/apache/carbondata/pull/2268 [CARBONDATA-2434] Add ExternalTableExample and LuceneDataMapExample For preparing 1.4.0 release. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [X] Any interfaces changed? NA - [X] Any backward compatibility impacted? NA - [X] Document update required? NA - [X] Testing done DONE - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenliang613/carbondata external_example Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2268.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2268 ---- commit 3d3b19fbb78421ab59fd2b20cbe0b7bfb693b6c4 Author: chenliang613 <chenliang613@...> Date: 2018-05-04T02:35:52Z [CARBONDATA-2434] Add ExternalTableExample and LuceneDataMapExample ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2268 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5627/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2268 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4467/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2268 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5674/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2268 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4514/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2268 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4751/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2268 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4517/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2268 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5677/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2268 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4754/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2268#discussion_r186705671 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/LuceneDataMapExample.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.spark.sql.{SaveMode, SparkSession} + +import org.apache.carbondata.examples.util.ExampleUtils + + +/** + * This example is for lucene datamap. + */ + +object LuceneDataMapExample { + + def main(args: Array[String]) { + val spark = ExampleUtils.createCarbonSession("LuceneDataMapExample") + exampleBody(spark) + spark.close() + } + + def exampleBody(spark : SparkSession): Unit = { + + // build the test data, please increase the data for more obvious comparison. + // if set the data is larger than 100M, it will take 10+ mins. + import scala.util.Random + + import spark.implicits._ + val r = new Random() + val df = spark.sparkContext.parallelize(1 to 10 * 10 * 1000) + .map(x => ("which test" + r.nextInt(10000) + " good" + r.nextInt(10), + "who and name" + x % 8, "city" + x % 50, x % 60)) + .toDF("id", "name", "city", "age") + + spark.sql("DROP TABLE IF EXISTS personTable") + df.write.format("carbondata") + .option("tableName", "personTable") + .option("compress", "true") + .mode(SaveMode.Overwrite).save() + + // create lucene datamap on personTable + spark.sql( + s""" + | CREATE DATAMAP IF NOT EXISTS dm ON TABLE personTable + | USING 'lucene' + | DMProperties('INDEX_COLUMNS'='id , name') + """.stripMargin) + + spark.sql("refresh datamap dm ON TABLE personTable") + + // 1. Compare the performance: + + def time(code: => Unit): Double = { + val start = System.currentTimeMillis() + code + // return time in second + (System.currentTimeMillis() - start).toDouble / 1000 + } + + val time_without_lucenedatamap = time { + + spark.sql( + s""" + | SELECT count(*) + | FROM personTable where id like '% test1 %' + """.stripMargin).show() + + } + + val time_with_lucenedatamap = time { + + spark.sql( + s""" + | SELECT count(*) + | FROM personTable where TEXT_MATCH('id:test1') + """.stripMargin).show() + + } + + // scalastyle:off + println("time for query on table with lucene datamap table:" + time_with_lucenedatamap.toString) + println("time for query on table without lucene datamap table:" + time_without_lucenedatamap.toString) + // scalastyle:on + + // 2. Search for word "test1" and not "good" in the id field + spark.sql( + s""" + | SELECT id,name + | FROM personTable where TEXT_MATCH('id:test1 -id:good1') + """.stripMargin).show(100) + + // 3. TEXT_MATCH_WITH_LIMIT usage: +// spark.sql( +// s""" +// | SELECT id,name +// | FROM personTable where TEXT_MATCH_WITH_LIMIT('id:test1,10') +// """.stripMargin).show() --- End diff -- Uncomment it and change to `TEXT_MATCH_WITH_LIMIT('id:test1',10)` to work --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2268 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5737/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2268 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4800/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2268 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4579/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2268 LGTM --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |