GitHub user sounakr opened a pull request:
https://github.com/apache/carbondata/pull/2055 [CARBONDATA-2224][File Level Reader Support] External File level reader support File level reader reads any carbondata file placed in any external file path. The reading can be done through 3 methods. a) Reading as a datasource from Spark. CarbonFileLevelFormat.scala is used in this case to read the file. To create a spark datasource external table " CREATE TABLE sdkOutputTable USING CarbonDataFileFormat LOCATION '$writerOutputFilePath1'" For more details please refer the test file org/apache/carbondata/spark/testsuite/createTable/TestCreateTableUsingCarbonFileLevelFormat.scala file. b) Reading from spark sql as a external table. CarbonFileinputFormat.java is used for reading the files. The create table syntax for this will be "CREATE EXTERNAL TABLE sdkOutputTable STORED BY 'carbondatafileformat' LOCATION '$writerOutputFilePath6'" For more details org/apache/carbondata/spark/testsuite/createTable/TestCarbonFileInputFormatWithExternalCarbonTable.scala. c) Reading Through Hadoop Map reduce job. Please refer org/apache/carbondata/mapred/TestMapReduceCarbonFileInputFormat.java for more details. Limitation :: This implementation depend on writer SDK file path as following table_name/Fact/Part0/Segment_null. This reader writer must be independent of static path. Due to this reader currently won't work with standard partition also. This will be handled in future PRs. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sounakr/incubator-carbondata file_level_reader_master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2055.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2055 ---- commit 5e65f3f97749571a74b6c04a05f5b09aec709787 Author: sounakr <sounakr@...> Date: 2018-02-24T02:25:14Z File Format Reader commit bcb8f64d61e19787fb3303a00d59cb61a6ebce32 Author: sounakr <sounakr@...> Date: 2018-02-26T11:58:47Z File Format Phase 2 commit 35b09072d7d75677f473e9d54b3a5db0ff1b64dc Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T06:06:56Z * File Format Phase 2 (cleanup code) commit 466abfad2fdcc50d69dbbf32791466b7fc4836d1 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T06:36:28Z * File Format Phase 2 (cleanup code) commit 5b2ad29bc9402e223af22124cc6d3d91962e72f4 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T09:54:43Z * File Format Phase 2 (cleanup code and adding testCase) commit 994372f0d2c7e8c528f9900c7b17ff8c8a857698 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T11:58:37Z * File Format Phase 2 (filter issue fix) commit e3160888dcac715928f9d18febd33b22177513a0 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T12:20:46Z * File Format Phase 2 (filter issue fix return value) commit 949e6a97680f46a91808be094505a519340a1a53 Author: sounakr <sounakr@...> Date: 2018-02-27T13:55:16Z Clear DataMap Cache commit 7fdccc3885ab1c731d7066e36a2237372198ae22 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T14:02:35Z * File Format Phase 2 (test cases) commit 528e8120527a712308adee4b91d516a9891975ea Author: sounakr <sounakr@...> Date: 2018-02-28T03:18:45Z Refactor CarbonFileInputFormat commit 0a2b2249ea8486d2a217ff245b2311bb96936d64 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-28T10:02:08Z * File Format Phase 2 a. test cases addition b. Exception handling when the files are not present c. Setting the filter expression in carbonTableInputFormat commit fdfe2f405a2bb8ca122a785919290bc82a72c01c Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-28T10:02:08Z * File Format Phase 2 a. test cases addition b. Exception handling when the files are not present c. Setting the filter expression in carbonTableInputFormat commit 64627d2f2953779a9ee32f23be0b552b6b18f1d9 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-28T10:02:08Z * File Format Phase 2 a. test cases addition b. Exception handling when the files are not present c. Setting the filter expression in carbonTableInputFormat commit 8871e3140afa008794dfa0e8e2df58f5b29f46bd Author: sounakr <sounakr@...> Date: 2018-03-01T11:23:39Z Map Reduce Test Case for CarbonInputFileFormat commit 51403245ce250625de7a0bd20e369d3011f2eeb9 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-03-01T11:41:03Z * fixed the issues Existing external table flow got impacted Added a new storage(provider) carbondatafileformat for external table creation commit 1f89d92c947e4b4a1248493552187b70d1f51dba Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-03-01T15:32:07Z * Bug fixes CarbonFileInputFormat flow 3 issue fixes. a. schema ordinal b. table path problem in absolute identifier c. drop of external table fix d. unwanted code cleanup commit e1e2ae5019c863d1d43d91d8f5f6852c6d92be29 Author: sounakr <sounakr@...> Date: 2018-03-02T05:09:45Z Review Code commit 1e374feadd7dd86848b31fed113cf234f0ddb542 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-03-05T11:07:10Z merge conflict fix commit 97d90a1d2bf461dea0259153ab9b28247c2a75ab Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-03-06T10:08:20Z * Fixed the test script failure for spark 2.1 commit b3dc89c278b6f89ce9c63ea9f3597124f6916543 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-03-06T11:58:32Z * Fixed the test script failure for spark 2.1, 2.2 commit eca6617089702b246dcfb9b039be04d61ede5c6b Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-03-12T12:46:10Z * Fix the compilation errors after rebase to master. commit 761a7ba32b7a4fc990f80e4ed6dc4e0294d7747c Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-03-12T12:59:00Z *Fixing the test case of this requirement commit 16745af45b0683d2121a40272dde92cc07275c93 Author: sounakr <sounakr@...> Date: 2018-03-12T18:45:19Z Review Comments ---- --- |
Github user sounakr commented on the issue:
https://github.com/apache/carbondata/pull/2055 Retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2975/ --- |
In reply to this post by qiuchenjian-2
Github user sounakr commented on the issue:
https://github.com/apache/carbondata/pull/2055 Retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4221/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2976/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2055 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3868/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2055 retest this, please ... --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4239/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2995/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2055 retest this, please... --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2998/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4242/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2055 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3881/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2055 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3883/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174208296 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java --- @@ -0,0 +1,678 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.hadoop.api; + +import java.io.ByteArrayInputStream; +import java.io.DataInputStream; +import java.io.IOException; +import java.io.Serializable; +import java.lang.reflect.Constructor; +import java.util.ArrayList; +import java.util.BitSet; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datamap.DataMapChooser; +import org.apache.carbondata.core.datamap.DataMapLevel; +import org.apache.carbondata.core.datamap.Segment; +import org.apache.carbondata.core.datamap.dev.expr.DataMapExprWrapper; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.exception.InvalidConfigurationException; +import org.apache.carbondata.core.indexstore.ExtendedBlocklet; +import org.apache.carbondata.core.indexstore.PartitionSpec; +import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory; +import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore; +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.metadata.ColumnarFormatVersion; +import org.apache.carbondata.core.metadata.schema.PartitionInfo; +import org.apache.carbondata.core.metadata.schema.partition.PartitionType; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.TableInfo; +import org.apache.carbondata.core.mutate.UpdateVO; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.filter.SingleTableProvider; +import org.apache.carbondata.core.scan.filter.TableProvider; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.core.stats.QueryStatistic; +import org.apache.carbondata.core.stats.QueryStatisticsConstants; +import org.apache.carbondata.core.stats.QueryStatisticsRecorder; +import org.apache.carbondata.core.statusmanager.SegmentUpdateStatusManager; +import org.apache.carbondata.core.util.CarbonProperties; +import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.util.DataTypeConverter; +import org.apache.carbondata.core.util.DataTypeConverterImpl; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.CarbonMultiBlockSplit; +import org.apache.carbondata.hadoop.CarbonProjection; +import org.apache.carbondata.hadoop.CarbonRecordReader; +import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport; +import org.apache.carbondata.hadoop.readsupport.impl.DictionaryDecodeReadSupport; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; +import org.apache.carbondata.hadoop.util.ObjectSerializationUtil; +import org.apache.carbondata.hadoop.util.SchemaReader; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.LocalFileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.Reporter; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; +import org.apache.hadoop.mapreduce.RecordReader; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; +import org.apache.hadoop.mapreduce.security.TokenCache; + +/** + * Input format of CarbonData file. + * + * @param <T> + */ +public class CarbonFileInputFormat<T> extends FileInputFormat<Void, T> implements Serializable { --- End diff -- Please annotate this class using InterfaceAudience.User and InterfaceStability.Evolving --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174208602 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/mapred/TestMapReduceCarbonFileInputFormat.java --- @@ -0,0 +1,193 @@ +/* --- End diff -- There are some binary files in this PR, please remove them --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174208671 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/mapred/TestMapReduceCarbonFileInputFormat.java --- @@ -0,0 +1,193 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.mapred; + + +import java.io.BufferedReader; +import java.io.BufferedWriter; +import java.io.File; +import java.io.FileFilter; +import java.io.FileReader; +import java.io.FileWriter; +import java.io.IOException; +import java.util.List; +import java.util.UUID; + +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.hadoop.CarbonProjection; +import org.apache.carbondata.hadoop.api.CarbonFileInputFormat; +import org.apache.carbondata.hadoop.api.CarbonTableInputFormat; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.conf.Configured; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.IntWritable; +import org.apache.hadoop.io.Text; +import org.apache.hadoop.mapred.FileInputFormat; +import org.apache.hadoop.mapred.JobClient; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapreduce.Job; +import org.apache.hadoop.mapreduce.Mapper; +import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; +import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; +import org.apache.hadoop.util.Tool; +import org.apache.hadoop.util.ToolRunner; +import org.junit.Assert; +import org.junit.Test; + +public class TestMapReduceCarbonFileInputFormat { + + private static final Log LOG = LogFactory.getLog(TestMapReduceCarbonFileInputFormat.class); + + private int countTheLines(String outPath) throws Exception { + File file = new File(outPath); + if (file.exists()) { + BufferedReader reader = new BufferedReader(new FileReader(file)); + int i = 0; + while (reader.readLine() != null) { + i++; + } + reader.close(); + return i; + } + return 0; + } + + private int countTheColumns(String outPath) throws Exception { + File file = new File(outPath); + if (file.exists()) { + BufferedReader reader = new BufferedReader(new FileReader(file)); + String[] split = reader.readLine().split(","); + reader.close(); + return split.length; + } + return 0; + } + + --- End diff -- remove empty lines --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174208978 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/SchemaReader.java --- @@ -79,4 +79,19 @@ public static TableInfo getTableInfo(AbsoluteTableIdentifier identifier) carbonTableIdentifier.getTableName(), identifier.getTablePath()); } + + + public static TableInfo inferSchemaForExternalTable(AbsoluteTableIdentifier identifier) --- End diff -- rename to `inferSchema`, and can you pass the tablePath only --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174211199 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableUsingCarbonFileLevelFormat.scala --- @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.createTable + +import java.io.File + +import org.apache.spark.sql.{AnalysisException, CarbonEnv} +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterAll +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException + +class TestCreateTableUsingCarbonFileLevelFormat extends QueryTest with BeforeAndAfterAll { --- End diff -- This suite is fine, but can you add one suite using SparkSession instead of CarbonSession? --- |
Free forum by Nabble | Edit this page |