Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2268: [CARBONDATA-2434] Add ExternalTableExample an...

Classic

List

15 messages Options

Options

[GitHub] carbondata pull request #2268: [CARBONDATA-2434] Add ExternalTableExample an...

GitHub user chenliang613 opened a pull request:

https://github.com/apache/carbondata/pull/2268

[CARBONDATA-2434] Add ExternalTableExample and LuceneDataMapExample

For preparing 1.4.0 release.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [X] Any interfaces changed?
NA
- [X] Any backward compatibility impacted?
NA
- [X] Document update required?
NA
- [X] Testing done
DONE

- [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenliang613/carbondata external_example

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2268.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2268

----
commit 3d3b19fbb78421ab59fd2b20cbe0b7bfb693b6c4
Author: chenliang613 <chenliang613@...>
Date: 2018-05-04T02:35:52Z

[CARBONDATA-2434] Add ExternalTableExample and LuceneDataMapExample

----

---

[GitHub] carbondata issue #2268: [CARBONDATA-2434] Add ExternalTableExample and Lucen...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2268

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5627/

---

[GitHub] carbondata issue #2268: [CARBONDATA-2434] Add ExternalTableExample and Lucen...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2268

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4467/

---

[GitHub] carbondata issue #2268: [CARBONDATA-2434] Add ExternalTableExample and Lucen...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2268

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5674/

---

[GitHub] carbondata issue #2268: [CARBONDATA-2434] Add ExternalTableExample and Lucen...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2268

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4514/

---

[GitHub] carbondata issue #2268: [CARBONDATA-2434] Add ExternalTableExample and Lucen...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2268

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4751/

---

[GitHub] carbondata issue #2268: [CARBONDATA-2434] Add ExternalTableExample and Lucen...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2268

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4517/

---

[GitHub] carbondata issue #2268: [CARBONDATA-2434] Add ExternalTableExample and Lucen...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2268

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5677/

---

[GitHub] carbondata issue #2268: [CARBONDATA-2434] Add ExternalTableExample and Lucen...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2268

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4754/

---

[GitHub] carbondata pull request #2268: [CARBONDATA-2434] Add ExternalTableExample an...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2268#discussion_r186705671

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/LuceneDataMapExample.scala ---
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.spark.sql.{SaveMode, SparkSession}
+
+import org.apache.carbondata.examples.util.ExampleUtils
+
+
+/**
+ * This example is for lucene datamap.
+ */
+
+object LuceneDataMapExample {
+
+ def main(args: Array[String]) {
+ val spark = ExampleUtils.createCarbonSession("LuceneDataMapExample")
+ exampleBody(spark)
+ spark.close()
+ }
+
+ def exampleBody(spark : SparkSession): Unit = {
+
+ // build the test data, please increase the data for more obvious comparison.
+ // if set the data is larger than 100M, it will take 10+ mins.
+ import scala.util.Random
+
+ import spark.implicits._
+ val r = new Random()
+ val df = spark.sparkContext.parallelize(1 to 10 * 10 * 1000)
+ .map(x => ("which test" + r.nextInt(10000) + " good" + r.nextInt(10),
+ "who and name" + x % 8, "city" + x % 50, x % 60))
+ .toDF("id", "name", "city", "age")
+
+ spark.sql("DROP TABLE IF EXISTS personTable")
+ df.write.format("carbondata")
+ .option("tableName", "personTable")
+ .option("compress", "true")
+ .mode(SaveMode.Overwrite).save()
+
+ // create lucene datamap on personTable
+ spark.sql(
+ s"""
+ | CREATE DATAMAP IF NOT EXISTS dm ON TABLE personTable
+ | USING 'lucene'
+ | DMProperties('INDEX_COLUMNS'='id , name')
+ """.stripMargin)
+
+ spark.sql("refresh datamap dm ON TABLE personTable")
+
+ // 1. Compare the performance:
+
+ def time(code: => Unit): Double = {
+ val start = System.currentTimeMillis()
+ code
+ // return time in second
+ (System.currentTimeMillis() - start).toDouble / 1000
+ }
+
+ val time_without_lucenedatamap = time {
+
+ spark.sql(
+ s"""
+ | SELECT count(*)
+ | FROM personTable where id like '% test1 %'
+ """.stripMargin).show()
+
+ }
+
+ val time_with_lucenedatamap = time {
+
+ spark.sql(
+ s"""
+ | SELECT count(*)
+ | FROM personTable where TEXT_MATCH('id:test1')
+ """.stripMargin).show()
+
+ }
+
+ // scalastyle:off
+ println("time for query on table with lucene datamap table:" + time_with_lucenedatamap.toString)
+ println("time for query on table without lucene datamap table:" + time_without_lucenedatamap.toString)
+ // scalastyle:on
+
+ // 2. Search for word "test1" and not "good" in the id field
+ spark.sql(
+ s"""
+ | SELECT id,name
+ | FROM personTable where TEXT_MATCH('id:test1 -id:good1')
+ """.stripMargin).show(100)
+
+ // 3. TEXT_MATCH_WITH_LIMIT usage:
+// spark.sql(
+// s"""
+// | SELECT id,name
+// | FROM personTable where TEXT_MATCH_WITH_LIMIT('id:test1,10')
+// """.stripMargin).show()
--- End diff --

Uncomment it and change to `TEXT_MATCH_WITH_LIMIT('id:test1',10)` to work

---

[GitHub] carbondata issue #2268: [CARBONDATA-2434] Add ExternalTableExample and Lucen...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2268

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5737/

---

[GitHub] carbondata issue #2268: [CARBONDATA-2434] Add ExternalTableExample and Lucen...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2268

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4800/

---

[GitHub] carbondata issue #2268: [CARBONDATA-2434] Add ExternalTableExample and Lucen...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2268

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4579/

---

[GitHub] carbondata issue #2268: [CARBONDATA-2434] Add ExternalTableExample and Lucen...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2268

LGTM

---

[GitHub] carbondata pull request #2268: [CARBONDATA-2434] Add ExternalTableExample an...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2268

---