GitHub user ajantha-bhat opened a pull request:
https://github.com/apache/carbondata/pull/2198 [CARBONDATA-2369] Add a document for Non Transactional table with SDK writer guide [CARBONDATA-2369] Add a document for Non Transactional table with SDK writer guide As per PR#2131 [CARBONDATA-2313] Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? yes, updated - [ ] Testing done -- NA - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata master_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2198.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2198 ---- commit 4506e44f75f7723a9e1b18c110a9b68bdbe0582d Author: ajantha-bhat <ajanthabhat@...> Date: 2018-04-20T11:06:37Z [CARBONDATA-2369] Add a document for Non Transactional table with SDK writer guide ---- --- |
Github user sgururajshetty commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2198#discussion_r183031656 --- Diff: docs/sdk-writer-guide.md --- @@ -0,0 +1,140 @@ +# SDK Writer Guide +In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar. +This SDK writer, writes carbondata file and carbonindex file at a given path. +External client can make use of this writer to convert other format data or live data to create carbondata and index files. +These SDK writer output contains just a carbondata and carbonindex files. No metadata folder will be present. + +## Quick example + +```scala + import java.io.IOException; + + import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException; + import org.apache.carbondata.core.metadata.datatype.DataTypes; + import org.apache.carbondata.sdk.file.CarbonWriter; + import org.apache.carbondata.sdk.file.CarbonWriterBuilder; + import org.apache.carbondata.sdk.file.Field; + import org.apache.carbondata.sdk.file.Schema; + + public class TestSdk { + + public static void main(String[] args) throws IOException, InvalidLoadOptionException { + testSdkWriter(); + } + + public static void testSdkWriter() throws IOException, InvalidLoadOptionException { + String path ="/home/root1/Documents/ab/temp"; + + Field[] fields =new Field[2]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("age", DataTypes.INT); + + Schema schema =new Schema(fields); + + CarbonWriterBuilder builder = CarbonWriter.builder() + .withSchema(schema) + .outputPath(path); + + CarbonWriter writer = builder.buildWriterForCSVInput(); + + int rows = 5; + for (int i = 0; i < rows; i++) { + writer.write(new String[]{"robot" + (i % 10), String.valueOf(i)}); + } + writer.close(); + } + } +``` + +## Datatypes Mapping +Each of SQL data types are mapped into data types of SDK. Following are the mapping: +| SQL DataTypes | Mapped SDK DataTypes | --- End diff -- Table formatting has issue, please check --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2198#discussion_r183035777 --- Diff: docs/sdk-writer-guide.md --- @@ -0,0 +1,140 @@ +# SDK Writer Guide +In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar. +This SDK writer, writes carbondata file and carbonindex file at a given path. +External client can make use of this writer to convert other format data or live data to create carbondata and index files. +These SDK writer output contains just a carbondata and carbonindex files. No metadata folder will be present. + +## Quick example + +```scala + import java.io.IOException; + + import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException; + import org.apache.carbondata.core.metadata.datatype.DataTypes; + import org.apache.carbondata.sdk.file.CarbonWriter; + import org.apache.carbondata.sdk.file.CarbonWriterBuilder; + import org.apache.carbondata.sdk.file.Field; + import org.apache.carbondata.sdk.file.Schema; + + public class TestSdk { + + public static void main(String[] args) throws IOException, InvalidLoadOptionException { + testSdkWriter(); + } + + public static void testSdkWriter() throws IOException, InvalidLoadOptionException { + String path ="/home/root1/Documents/ab/temp"; + + Field[] fields =new Field[2]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("age", DataTypes.INT); + + Schema schema =new Schema(fields); + + CarbonWriterBuilder builder = CarbonWriter.builder() + .withSchema(schema) + .outputPath(path); + + CarbonWriter writer = builder.buildWriterForCSVInput(); + + int rows = 5; + for (int i = 0; i < rows; i++) { + writer.write(new String[]{"robot" + (i % 10), String.valueOf(i)}); + } + writer.close(); + } + } +``` + +## Datatypes Mapping +Each of SQL data types are mapped into data types of SDK. Following are the mapping: +| SQL DataTypes | Mapped SDK DataTypes | --- End diff -- ok. Fixed it. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2198 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5239/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2198 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4060/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2198 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4065/ --- |
In reply to this post by qiuchenjian-2
Github user sgururajshetty commented on the issue:
https://github.com/apache/carbondata/pull/2198 LGTM --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2198 retest this please --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2198 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4501/ --- |
In reply to this post by qiuchenjian-2
Github user sgururajshetty commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2198#discussion_r183615513 --- Diff: docs/data-management-on-carbondata.md --- @@ -174,6 +174,50 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ``` +## CREATE EXTERNAL TABLE + This function allows user to create external table by specifying location. + ``` + CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.]table_name + STORED BY 'carbondata' LOCATION ‘$FilesPath’ + ``` + +### Create external table on managed table data location. + Managed table data location provided will have both FACT and Metadata folder. + This data can be generated by creating a normal carbon table and use this path as $FilesPath in the above syntax. + + **Example:** + ``` + sql("CREATE TABLE origin(key INT, value STRING) STORED BY 'carbondata'") + sql("INSERT INTO origin select 100,'spark'") + sql("INSERT INTO origin select 200,'hive'") + // creates a table in $storeLocation/origin + + sql(s""" + |CREATE EXTERNAL TABLE source + |STORED BY 'carbondata' + |LOCATION '$storeLocation/origin' + """.stripMargin) + checkAnswer(sql("SELECT count(*) from source"), sql("SELECT count(*) from origin")) + ``` + +### Create external table on Non-Transactional table data location. --- End diff -- There > there --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2198 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4190/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2198#discussion_r183625844 --- Diff: docs/data-management-on-carbondata.md --- @@ -174,6 +174,50 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ``` +## CREATE EXTERNAL TABLE + This function allows user to create external table by specifying location. + ``` + CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.]table_name + STORED BY 'carbondata' LOCATION ‘$FilesPath’ + ``` + +### Create external table on managed table data location. + Managed table data location provided will have both FACT and Metadata folder. + This data can be generated by creating a normal carbon table and use this path as $FilesPath in the above syntax. + + **Example:** + ``` + sql("CREATE TABLE origin(key INT, value STRING) STORED BY 'carbondata'") + sql("INSERT INTO origin select 100,'spark'") + sql("INSERT INTO origin select 200,'hive'") + // creates a table in $storeLocation/origin + + sql(s""" + |CREATE EXTERNAL TABLE source + |STORED BY 'carbondata' + |LOCATION '$storeLocation/origin' + """.stripMargin) + checkAnswer(sql("SELECT count(*) from source"), sql("SELECT count(*) from origin")) + ``` + +### Create external table on Non-Transactional table data location. --- End diff -- done. Modified. --- |
In reply to this post by qiuchenjian-2
Github user sgururajshetty commented on the issue:
https://github.com/apache/carbondata/pull/2198 LGTM --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2198 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4195/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2198 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4506/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2198 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5354/ --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2198#discussion_r183789400 --- Diff: docs/sdk-writer-guide.md --- @@ -0,0 +1,172 @@ +# SDK Writer Guide +In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar. +This SDK writer, writes carbondata file and carbonindex file at a given path. +External client can make use of this writer to convert other format data or live data to create carbondata and index files. +These SDK writer output contains just a carbondata and carbonindex files. No metadata folder will be present. + +## Quick example + +```scala + import java.io.IOException; + + import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException; + import org.apache.carbondata.core.metadata.datatype.DataTypes; + import org.apache.carbondata.sdk.file.CarbonWriter; + import org.apache.carbondata.sdk.file.CarbonWriterBuilder; + import org.apache.carbondata.sdk.file.Field; + import org.apache.carbondata.sdk.file.Schema; + + public class TestSdk { + + public static void main(String[] args) throws IOException, InvalidLoadOptionException { + testSdkWriter(); + } + + public static void testSdkWriter() throws IOException, InvalidLoadOptionException { + String path ="/home/root1/Documents/ab/temp"; + + Field[] fields =new Field[2]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("age", DataTypes.INT); + + Schema schema =new Schema(fields); + + CarbonWriterBuilder builder = CarbonWriter.builder() + .withSchema(schema) + .outputPath(path); + + CarbonWriter writer = builder.buildWriterForCSVInput(); + + int rows = 5; + for (int i = 0; i < rows; i++) { + writer.write(new String[]{"robot" + (i % 10), String.valueOf(i)}); + } + writer.close(); + } + } +``` + +## Datatypes Mapping +Each of SQL data types are mapped into data types of SDK. Following are the mapping: + +| SQL DataTypes | Mapped SDK DataTypes | +|---------------|----------------------| +| BOOLEAN | DataTypes.BOOLEAN | +| SMALLINT | DataTypes.SHORT | +| INTEGER | DataTypes.INT | +| BIGINT | DataTypes.LONG | +| DOUBLE | DataTypes.DOUBLE | +| VARCHAR | DataTypes.STRING | +| DATE | DataTypes.DATE | +| TIMESTAMP | DataTypes.TIMESTAMP | +| STRING | DataTypes.STRING | +| DECIMAL | DataTypes.createDecimalType(precision, scale) | + + +## API List +``` --- End diff -- Add these methods under class CarbonWriterBuilder --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2198#discussion_r183791062 --- Diff: docs/sdk-writer-guide.md --- @@ -0,0 +1,172 @@ +# SDK Writer Guide +In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar. +This SDK writer, writes carbondata file and carbonindex file at a given path. +External client can make use of this writer to convert other format data or live data to create carbondata and index files. +These SDK writer output contains just a carbondata and carbonindex files. No metadata folder will be present. + +## Quick example + +```scala + import java.io.IOException; + + import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException; + import org.apache.carbondata.core.metadata.datatype.DataTypes; + import org.apache.carbondata.sdk.file.CarbonWriter; + import org.apache.carbondata.sdk.file.CarbonWriterBuilder; + import org.apache.carbondata.sdk.file.Field; + import org.apache.carbondata.sdk.file.Schema; + + public class TestSdk { + + public static void main(String[] args) throws IOException, InvalidLoadOptionException { + testSdkWriter(); + } + + public static void testSdkWriter() throws IOException, InvalidLoadOptionException { + String path ="/home/root1/Documents/ab/temp"; + + Field[] fields =new Field[2]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("age", DataTypes.INT); + + Schema schema =new Schema(fields); + + CarbonWriterBuilder builder = CarbonWriter.builder() + .withSchema(schema) + .outputPath(path); + + CarbonWriter writer = builder.buildWriterForCSVInput(); + + int rows = 5; + for (int i = 0; i < rows; i++) { + writer.write(new String[]{"robot" + (i % 10), String.valueOf(i)}); + } + writer.close(); + } + } +``` + +## Datatypes Mapping +Each of SQL data types are mapped into data types of SDK. Following are the mapping: + +| SQL DataTypes | Mapped SDK DataTypes | +|---------------|----------------------| +| BOOLEAN | DataTypes.BOOLEAN | +| SMALLINT | DataTypes.SHORT | +| INTEGER | DataTypes.INT | +| BIGINT | DataTypes.LONG | +| DOUBLE | DataTypes.DOUBLE | +| VARCHAR | DataTypes.STRING | +| DATE | DataTypes.DATE | +| TIMESTAMP | DataTypes.TIMESTAMP | +| STRING | DataTypes.STRING | +| DECIMAL | DataTypes.createDecimalType(precision, scale) | + + +## API List +``` +/** +* prepares the builder with the schema provided +* @param schema is instance of Schema +* @return updated CarbonWriterBuilder +*/ +public CarbonWriterBuilder withSchema(Schema schema); --- End diff -- 1. Add CarbonWriter class details of methods in classe CarbonWriter 2. Add method buildWriterForCSVInput, buildWriterForAvroInput 3. Add example, writing CSV record and writing Avro record. --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2198#discussion_r183792089 --- Diff: docs/sdk-writer-guide.md --- @@ -0,0 +1,172 @@ +# SDK Writer Guide +In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar. +This SDK writer, writes carbondata file and carbonindex file at a given path. +External client can make use of this writer to convert other format data or live data to create carbondata and index files. +These SDK writer output contains just a carbondata and carbonindex files. No metadata folder will be present. + +## Quick example + +```scala + import java.io.IOException; + + import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException; + import org.apache.carbondata.core.metadata.datatype.DataTypes; + import org.apache.carbondata.sdk.file.CarbonWriter; + import org.apache.carbondata.sdk.file.CarbonWriterBuilder; + import org.apache.carbondata.sdk.file.Field; + import org.apache.carbondata.sdk.file.Schema; + + public class TestSdk { + + public static void main(String[] args) throws IOException, InvalidLoadOptionException { + testSdkWriter(); + } + + public static void testSdkWriter() throws IOException, InvalidLoadOptionException { + String path ="/home/root1/Documents/ab/temp"; + + Field[] fields =new Field[2]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("age", DataTypes.INT); + + Schema schema =new Schema(fields); + + CarbonWriterBuilder builder = CarbonWriter.builder() + .withSchema(schema) + .outputPath(path); + + CarbonWriter writer = builder.buildWriterForCSVInput(); + + int rows = 5; + for (int i = 0; i < rows; i++) { + writer.write(new String[]{"robot" + (i % 10), String.valueOf(i)}); + } + writer.close(); + } + } +``` + +## Datatypes Mapping +Each of SQL data types are mapped into data types of SDK. Following are the mapping: + +| SQL DataTypes | Mapped SDK DataTypes | +|---------------|----------------------| +| BOOLEAN | DataTypes.BOOLEAN | +| SMALLINT | DataTypes.SHORT | +| INTEGER | DataTypes.INT | +| BIGINT | DataTypes.LONG | +| DOUBLE | DataTypes.DOUBLE | +| VARCHAR | DataTypes.STRING | +| DATE | DataTypes.DATE | +| TIMESTAMP | DataTypes.TIMESTAMP | +| STRING | DataTypes.STRING | +| DECIMAL | DataTypes.createDecimalType(precision, scale) | + + +## API List +``` +/** +* prepares the builder with the schema provided +* @param schema is instance of Schema +* @return updated CarbonWriterBuilder +*/ +public CarbonWriterBuilder withSchema(Schema schema); +``` + +``` +/** +* Sets the output path of the writer builder +* @param path is the absolute path where output files are written +* @return updated CarbonWriterBuilder +*/ +public CarbonWriterBuilder outputPath(String path); +``` + +``` +/** +* If set false, writes the carbondata and carbonindex files in a flat folder structure --- End diff -- What when is the behaviour when set to true, what is the default value for all optional parameters --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2198#discussion_r183793026 --- Diff: docs/sdk-writer-guide.md --- @@ -0,0 +1,172 @@ +# SDK Writer Guide +In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar. +This SDK writer, writes carbondata file and carbonindex file at a given path. +External client can make use of this writer to convert other format data or live data to create carbondata and index files. +These SDK writer output contains just a carbondata and carbonindex files. No metadata folder will be present. + +## Quick example + +```scala + import java.io.IOException; + + import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException; + import org.apache.carbondata.core.metadata.datatype.DataTypes; + import org.apache.carbondata.sdk.file.CarbonWriter; + import org.apache.carbondata.sdk.file.CarbonWriterBuilder; + import org.apache.carbondata.sdk.file.Field; + import org.apache.carbondata.sdk.file.Schema; + + public class TestSdk { + + public static void main(String[] args) throws IOException, InvalidLoadOptionException { + testSdkWriter(); + } + + public static void testSdkWriter() throws IOException, InvalidLoadOptionException { + String path ="/home/root1/Documents/ab/temp"; + + Field[] fields =new Field[2]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("age", DataTypes.INT); + + Schema schema =new Schema(fields); + + CarbonWriterBuilder builder = CarbonWriter.builder() + .withSchema(schema) + .outputPath(path); + + CarbonWriter writer = builder.buildWriterForCSVInput(); + + int rows = 5; + for (int i = 0; i < rows; i++) { + writer.write(new String[]{"robot" + (i % 10), String.valueOf(i)}); + } + writer.close(); + } + } +``` + +## Datatypes Mapping +Each of SQL data types are mapped into data types of SDK. Following are the mapping: + +| SQL DataTypes | Mapped SDK DataTypes | +|---------------|----------------------| +| BOOLEAN | DataTypes.BOOLEAN | +| SMALLINT | DataTypes.SHORT | +| INTEGER | DataTypes.INT | +| BIGINT | DataTypes.LONG | +| DOUBLE | DataTypes.DOUBLE | +| VARCHAR | DataTypes.STRING | +| DATE | DataTypes.DATE | +| TIMESTAMP | DataTypes.TIMESTAMP | +| STRING | DataTypes.STRING | +| DECIMAL | DataTypes.createDecimalType(precision, scale) | + + +## API List +``` +/** +* prepares the builder with the schema provided +* @param schema is instance of Schema +* @return updated CarbonWriterBuilder +*/ +public CarbonWriterBuilder withSchema(Schema schema); +``` + +``` +/** +* Sets the output path of the writer builder +* @param path is the absolute path where output files are written +* @return updated CarbonWriterBuilder +*/ +public CarbonWriterBuilder outputPath(String path); +``` + +``` +/** +* If set false, writes the carbondata and carbonindex files in a flat folder structure +* @param isTransactionalTable is a boolelan value if set to false then writes +* the carbondata and carbonindex files in a flat folder structure +* @return updated CarbonWriterBuilder +*/ +public CarbonWriterBuilder isTransactionalTable(boolean isTransactionalTable); +``` + +``` +/** +* to set the timestamp in the carbondata and carbonindex index files +* @param UUID is a timestamp to be used in the carbondata +* and carbonindex index files +* @return updated CarbonWriterBuilder +*/ +public CarbonWriterBuilder uniqueIdentifier(long UUID); +``` + +``` +/** +* To set the carbondata file size in MB between 1MB-2048MB +* @param blockSize is size in MB between 1MB to 2048 MB +* @return updated CarbonWriterBuilder +*/ +public CarbonWriterBuilder withBlockSize(int blockSize); +``` + +``` +/** +* To set the blocklet size of carbondata file +* @param blockletSize is blocklet size in MB +* @return updated CarbonWriterBuilder +*/ +public CarbonWriterBuilder withBlockletSize(int blockletSize); +``` + +``` +/** +* sets the list of columns that needs to be in sorted order +* @param sortColumns is a string array of columns that needs to be sorted. +* If it is null, all dimensions are selected for sorting +* If it is empty array, no columns are sorted +* @return updated CarbonWriterBuilder +*/ +public CarbonWriterBuilder sortBy(String[] sortColumns); +``` + +``` +/** +* If set, creates a schema file in metadata folder. --- End diff -- what is the default value, what is the effect of setting isTransactionTable(true/false) --- |
Free forum by Nabble | Edit this page |