Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1352: [CARBONDATA-1174] Streaming Ingestion - schem...

Classic

List

Threaded

47 messages Options

123

qiuchenjian-2

[GitHub] carbondata pull request #1352: [CARBONDATA-1174] Streaming Ingestion - schem...

Github user aniketadnaik commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1352#discussion_r138813373

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/streaming/CarbonStreamingIngestFileSourceExample.scala ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.commons.lang.RandomStringUtils
+import org.apache.spark.sql.{SaveMode, SparkSession}
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.carbondata.examples.utils.StreamingCleanupUtil
+
+object CarbonStreamingIngestFileSourceExample {
+
+ def main(args: Array[String]) {
+
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
+ val storeLocation = s"$rootPath/examples/spark2/target/store"
+ val warehouse = s"$rootPath/examples/spark2/target/warehouse"
+ val metastoredb = s"$rootPath/examples/spark2/target"
+ val csvDataDir = s"$rootPath/examples/spark2/resources/csvDataDir"
+ // val csvDataFile = s"$csvDataDir/sampleData.csv"
+ // val csvDataFile = s"$csvDataDir/sample.csv"
+ val streamTableName = s"_carbon_file_stream_table_"
+ val stremTablePath = s"$storeLocation/default/$streamTableName"
+ val ckptLocation = s"$rootPath/examples/spark2/resources/ckptDir"
+
+ CarbonProperties.getInstance()
+ .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd")
+
+ // cleanup any residual files
+ StreamingCleanupUtil.main(Array(csvDataDir, ckptLocation))
+
+ import org.apache.spark.sql.CarbonSession._
+ val spark = SparkSession
+ .builder()
+ .master("local")
+ .appName("CarbonFileStreamingExample")
+ .config("spark.sql.warehouse.dir", warehouse)
+ .getOrCreateCarbonSession(storeLocation, metastoredb)
+
+ spark.sparkContext.setLogLevel("ERROR")
+
+ // Writes Dataframe to CarbonData file:
+ import spark.implicits._
+ import org.apache.spark.sql.types._
+
+ // Generate random data
+ val dataDF = spark.sparkContext.parallelize(1 to 10)
+ .map(id => (id, "name_ABC", "city_XYZ", 10000.00*id)).
+ toDF("id", "name", "city", "salary")
+
+ // drop table if exists previously
+ spark.sql(s"DROP TABLE IF EXISTS ${streamTableName}")
+
+ // Create Carbon Table
+ // Saves dataframe to carbondata file
+ dataDF.write
--- End diff --

yes, will add more comments.

---

qiuchenjian-2