GitHub user sounakr opened a pull request:
https://github.com/apache/carbondata/pull/2054 [CARBONDATA-2224][File Level Reader Support] External File level reader support File level reader reads any carbondata file placed in any external file path. The reading can be done through 3 methods. a) Reading as a datasource from Spark. CarbonFileLevelFormat.scala is used in this case to read the file. To create a spark datasource external table " CREATE TABLE sdkOutputTable USING CarbonDataFileFormat LOCATION '$writerOutputFilePath1'" For more details please refer the test file org/apache/carbondata/spark/testsuite/createTable/TestCreateTableUsingCarbonFileLevelFormat.scala file. b) Reading from spark sql as a external table. CarbonFileinputFormat.java is used for reading the files. The create table syntax for this will be "CREATE EXTERNAL TABLE sdkOutputTable STORED BY 'carbondatafileformat' LOCATION '$writerOutputFilePath6'" For more details org/apache/carbondata/spark/testsuite/createTable/TestCarbonFileInputFormatWithExternalCarbonTable.scala. c) Reading Through Hadoop Map reduce job. Please refer org/apache/carbondata/mapred/TestMapReduceCarbonFileInputFormat.java for more details. **Limitation** :: This implementation depend on writer SDK file path as following table_name/Fact/Part0/Segment_null. This reader writer must be independent of static path. Due to this reader currently won't work with standard partition also. This will be handled in future PRs. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sounakr/incubator-carbondata file_level_reader Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2054.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2054 ---- commit 5765fc4007c8514ccb20c6a98c7f4463483275fc Author: sounakr <sounakr@...> Date: 2018-02-24T02:25:14Z File Format Reader commit 685214f2537cdca75ecb58196e4b2a168e6c9cbb Author: sounakr <sounakr@...> Date: 2018-02-26T11:58:47Z File Format Phase 2 commit b1070c2322e5fdd0fce83fe36b083611b0b60bf6 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T06:06:56Z * File Format Phase 2 (cleanup code) commit 9bb51e9f6d475f58409815434edf089b60795584 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T06:36:28Z * File Format Phase 2 (cleanup code) commit 69d85aa1a869e6018cf25728f326328de027085a Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T09:54:43Z * File Format Phase 2 (cleanup code and adding testCase) commit f092f86a2ff033a5e1e7798cf8ed2658f8cb888d Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T11:58:37Z * File Format Phase 2 (filter issue fix) commit 13a97acb1562f1a8dfa8830cc0a872c5b6361961 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T12:20:46Z * File Format Phase 2 (filter issue fix return value) commit d146e1c2e5c67d3251ac99e7853351bd498b4b6a Author: sounakr <sounakr@...> Date: 2018-02-27T13:55:16Z Clear DataMap Cache commit eb97736e3cdd46b62f7f7203c10e2ac86fbea375 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T14:02:35Z * File Format Phase 2 (test cases) commit b192fe886be21b3d137944929cf45dd1c931bd65 Author: sounakr <sounakr@...> Date: 2018-02-28T03:18:45Z Refactor CarbonFileInputFormat commit 5916a476b215a44e4e580b870093182ef7ca5183 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-28T10:02:08Z * File Format Phase 2 a. test cases addition b. Exception handling when the files are not present c. Setting the filter expression in carbonTableInputFormat commit db65fcb48158eec6f8e02a528f07f72eae1b3d4a Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-28T10:02:08Z * File Format Phase 2 a. test cases addition b. Exception handling when the files are not present c. Setting the filter expression in carbonTableInputFormat commit ec1870763e28c48b7796ab090a911c55228cb614 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-28T10:02:08Z * File Format Phase 2 a. test cases addition b. Exception handling when the files are not present c. Setting the filter expression in carbonTableInputFormat commit 08508f0a0c5ab0c43b568bda84c2602f38ae3f3c Author: sounakr <sounakr@...> Date: 2018-03-01T11:23:39Z Map Reduce Test Case for CarbonInputFileFormat commit fe56389b55227bb287f2b8cffaf1a6da8b567fa8 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-03-01T11:41:03Z * fixed the issues Existing external table flow got impacted Added a new storage(provider) carbondatafileformat for external table creation commit 83784c00487cdf76b724d31218fdf57c241e7901 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-03-01T15:32:07Z * Bug fixes CarbonFileInputFormat flow 3 issue fixes. a. schema ordinal b. table path problem in absolute identifier c. drop of external table fix d. unwanted code cleanup commit 866807a01eb4c9617f36e141b19ccb6a94de6aca Author: sounakr <sounakr@...> Date: 2018-03-02T05:09:45Z Review Code commit 729fb7ea629bcec3afdf5f933309bc2db15663fd Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-03-05T11:07:10Z merge conflict fix commit 5767275d5788ea38b5f75920c84fd0a315932e4d Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-03-06T10:08:20Z * Fixed the test script failure for spark 2.1 commit ecf8b339d3402b482e71fc7f970f581bda5c4aff Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-03-06T11:58:32Z * Fixed the test script failure for spark 2.1, 2.2 commit da45328f111cd02b07783bfa340015bec64452dc Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-03-12T12:46:10Z * Fix the compilation errors after rebase to master. commit 13d40503e6ed559b80ec3465e85bf7ac3d2cf407 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-03-12T12:59:00Z *Fixing the test case of this requirement ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2054 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2964/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2054#discussion_r173841169 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java --- @@ -826,6 +826,12 @@ public boolean isExternalTable() { return external != null && external.equalsIgnoreCase("true"); } + public boolean isFileLevelExternalTable() { --- End diff -- why is this property required? --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2054#discussion_r173841691 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/SchemaReader.java --- @@ -28,7 +28,8 @@ import org.apache.carbondata.core.metadata.schema.table.TableInfo; import org.apache.carbondata.core.util.CarbonUtil; import org.apache.carbondata.core.util.path.CarbonTablePath; -import org.apache.carbondata.core.util.path.CarbonTablePath; + + --- End diff -- remove empty line --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2054#discussion_r173842122 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/SchemaReader.java --- @@ -79,4 +81,19 @@ public static TableInfo getTableInfo(AbsoluteTableIdentifier identifier) carbonTableIdentifier.getTableName(), identifier.getTablePath()); } + + + public static TableInfo inferSchemaForExternalTable(AbsoluteTableIdentifier identifier) --- End diff -- Can the input param change to `String tablePath`? --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2054#discussion_r173842755 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CSVCarbonWriterSuite.java --- @@ -68,13 +68,12 @@ public void testWriteFilesJsonSchema() throws IOException { private void writeFilesAndVerify(Schema schema, String path) { try { - CarbonWriter writer = CarbonWriter.builder() - .withSchema(schema) - .outputPath(path) - .buildWriterForCSVInput(); + CarbonWriter writer = + CarbonWriter.builder().withSchema(schema).outputPath(path).buildWriterForCSVInput(); --- End diff -- do not modify the code style --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2054#discussion_r173842791 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CSVCarbonWriterSuite.java --- @@ -68,13 +68,12 @@ public void testWriteFilesJsonSchema() throws IOException { private void writeFilesAndVerify(Schema schema, String path) { try { - CarbonWriter writer = CarbonWriter.builder() - .withSchema(schema) - .outputPath(path) - .buildWriterForCSVInput(); + CarbonWriter writer = + CarbonWriter.builder().withSchema(schema).outputPath(path).buildWriterForCSVInput(); for (int i = 0; i < 100; i++) { - writer.write(new String[]{"robot" + i, String.valueOf(i), String.valueOf((double) i / 2)}); + writer + .write(new String[] { "robot" + i, String.valueOf(i), String.valueOf((double) i / 2) }); --- End diff -- do not modify it since no change --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/2054 There are some binary files, please delete them --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2054 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4210/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2054#discussion_r173885274 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java --- @@ -826,6 +826,12 @@ public boolean isExternalTable() { return external != null && external.equalsIgnoreCase("true"); } + public boolean isFileLevelExternalTable() { --- End diff -- **stored by 'carbondatafileformat' is mapped with _filelevelexternal.** So, In carbonScanRDD, when mapReduce service or hadoop service calls carbonScanRDD. based on _filelevelexternal, new File level reader [CarbonFileInputFormat] will be called. External table can be table level (stored by 'carbondata') or file level (stored by 'carbondatafileformat') **This is used to identify the file level external table.** --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2054 @jackylk : Binary files [carbondata and index files (sdk Writer output)] are intentionally added for test cases of this requirement. Test cases will fail If we remove them. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2054 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3858/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2054 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4218/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2054 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2973/ --- |
In reply to this post by qiuchenjian-2
|
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2054 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3867/ --- |
Free forum by Nabble | Edit this page |