GitHub user ravipesala opened a pull request:
https://github.com/apache/carbondata/pull/3019 [WIP] Carbon Presto hive metastore Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata presto-hive-metastore Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3019.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3019 ---- commit 50f7a330e797bcdd71598fbf3145a71fa5df7215 Author: ravipesala <ravi.pesala@...> Date: 2018-12-19T16:30:57Z Create carbon table as hive metastore table commit 8621a302ff2f26470b19d31b24a21ff03e4a2831 Author: ravipesala <ravi.pesala@...> Date: 2018-12-19T15:49:41Z Added Hive Metastore to Carbon Presto ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1915/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10169/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2126/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1920/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1921/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2131/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10174/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1924/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2134/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10177/ --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3019#discussion_r243821704 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataPageSourceProvider.java --- @@ -43,63 +44,81 @@ import static org.apache.carbondata.presto.Types.checkType; +import com.facebook.presto.hive.HdfsEnvironment; +import com.facebook.presto.hive.HiveClientConfig; +import com.facebook.presto.hive.HiveColumnHandle; +import com.facebook.presto.hive.HivePageSourceFactory; +import com.facebook.presto.hive.HivePageSourceProvider; +import com.facebook.presto.hive.HiveRecordCursorProvider; +import com.facebook.presto.hive.HiveSplit; import com.facebook.presto.spi.ColumnHandle; import com.facebook.presto.spi.ConnectorPageSource; import com.facebook.presto.spi.ConnectorSession; import com.facebook.presto.spi.ConnectorSplit; -import com.facebook.presto.spi.connector.ConnectorPageSourceProvider; +import com.facebook.presto.spi.SchemaTableName; import com.facebook.presto.spi.connector.ConnectorTransactionHandle; +import com.facebook.presto.spi.type.TypeManager; import com.google.common.collect.ImmutableList; import com.google.inject.Inject; import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.TaskAttemptContextImpl; import org.apache.hadoop.mapred.TaskAttemptID; import org.apache.hadoop.mapreduce.TaskType; -import static com.google.common.base.Preconditions.checkArgument; import static com.google.common.base.Preconditions.checkNotNull; - /** * Provider Class for Carbondata Page Source class. */ -public class CarbondataPageSourceProvider implements ConnectorPageSourceProvider { +public class CarbondataPageSourceProvider extends HivePageSourceProvider { - private String connectorId; private CarbonTableReader carbonTableReader; private String queryId ; - - @Inject public CarbondataPageSourceProvider(CarbondataConnectorId connectorId, + private HdfsEnvironment hdfsEnvironment; + + @Inject public CarbondataPageSourceProvider( + HiveClientConfig hiveClientConfig, + HdfsEnvironment hdfsEnvironment, + Set<HiveRecordCursorProvider> cursorProviders, + Set<HivePageSourceFactory> pageSourceFactories, + TypeManager typeManager, CarbonTableReader carbonTableReader) { - this.connectorId = requireNonNull(connectorId, "connectorId is null").toString(); + super(hiveClientConfig, hdfsEnvironment, cursorProviders, pageSourceFactories, typeManager); this.carbonTableReader = requireNonNull(carbonTableReader, "carbonTableReader is null"); + this.hdfsEnvironment = hdfsEnvironment; } @Override public ConnectorPageSource createPageSource(ConnectorTransactionHandle transactionHandle, ConnectorSession session, ConnectorSplit split, List<ColumnHandle> columns) { - this.queryId = ((CarbondataSplit)split).getQueryId(); + HiveSplit carbonSplit = + checkType(split, HiveSplit.class, "split is not class HiveSplit"); + if (carbonSplit.getSchema().getProperty("queryId") == null) { + return super.createPageSource(transactionHandle, session, split, columns); + } + this.queryId = carbonSplit.getSchema().getProperty("queryId"); + Configuration configuration = this.hdfsEnvironment.getConfiguration( + new HdfsEnvironment.HdfsContext(session, carbonSplit.getDatabase(), carbonSplit.getTable()), + new Path(carbonSplit.getSchema().getProperty("tablePath"))); + configuration = carbonTableReader.updateS3Properties(configuration); CarbonDictionaryDecodeReadSupport readSupport = new CarbonDictionaryDecodeReadSupport(); PrestoCarbonVectorizedRecordReader carbonRecordReader = - createReader(split, columns, readSupport); + createReader(carbonSplit, columns, readSupport, configuration); return new CarbondataPageSource(carbonRecordReader, columns); } /** - * @param split + * @param carbonSplit * @param columns * @param readSupport --- End diff -- missing param of conf --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3019#discussion_r243821888 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala --- @@ -72,69 +74,107 @@ object CarbonSessionExample { val path = s"$rootPath/examples/spark2/src/main/resources/data.csv" // scalastyle:off - spark.sql( - s""" - | LOAD DATA LOCAL INPATH '$path' - | INTO TABLE source - | OPTIONS('HEADER'='true', 'COMPLEX_DELIMITER_LEVEL_1'='#') - """.stripMargin) - // scalastyle:on - - spark.sql( - s""" - | SELECT charField, stringField, intField - | FROM source - | WHERE stringfield = 'spark' AND decimalField > 40 - """.stripMargin).show() - - spark.sql( - s""" - | SELECT * - | FROM source WHERE length(stringField) = 5 - """.stripMargin).show() - - spark.sql( - s""" - | SELECT * - | FROM source WHERE date_format(dateField, "yyyy-MM-dd") = "2015-07-23" - """.stripMargin).show() - - spark.sql("SELECT count(stringField) FROM source").show() - - spark.sql( - s""" - | SELECT sum(intField), stringField - | FROM source - | GROUP BY stringField - """.stripMargin).show() - - spark.sql( - s""" - | SELECT t1.*, t2.* - | FROM source t1, source t2 - | WHERE t1.stringField = t2.stringField - """.stripMargin).show() - - spark.sql( - s""" - | WITH t1 AS ( - | SELECT * FROM source - | UNION ALL - | SELECT * FROM source - | ) - | SELECT t1.*, t2.* - | FROM t1, source t2 - | WHERE t1.stringField = t2.stringField - """.stripMargin).show() - - spark.sql( - s""" - | SELECT * - | FROM source - | WHERE stringField = 'spark' and floatField > 2.8 - """.stripMargin).show() +// spark.sql( --- End diff -- please revert the changes in this file --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3019#discussion_r243822523 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataPageSourceProvider.java --- @@ -113,7 +132,7 @@ private PrestoCarbonVectorizedRecordReader createReader(ConnectorSplit split, PrestoCarbonVectorizedRecordReader reader = new PrestoCarbonVectorizedRecordReader(queryExecutor, queryModel, (AbstractDetailQueryResultIterator) iterator, readSupport); - reader.setTaskId(carbondataSplit.getIndex()); + reader.setTaskId(Long.parseLong(carbonSplit.getSchema().getProperty("index"))); --- End diff -- I think itâs better to catch the NumberFormatException --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1931/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10184/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1932/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2141/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3019 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10185/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3019#discussion_r243849816 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataModule.java --- @@ -17,62 +17,150 @@ package org.apache.carbondata.presto; -import javax.inject.Inject; +import java.util.function.Supplier; import static java.util.Objects.requireNonNull; -import org.apache.carbondata.presto.impl.CarbonTableConfig; import org.apache.carbondata.presto.impl.CarbonTableReader; +import com.facebook.presto.hive.CoercionPolicy; +import com.facebook.presto.hive.DirectoryLister; +import com.facebook.presto.hive.FileFormatDataSourceStats; +import com.facebook.presto.hive.GenericHiveRecordCursorProvider; +import com.facebook.presto.hive.HadoopDirectoryLister; +import com.facebook.presto.hive.HdfsConfiguration; +import com.facebook.presto.hive.HdfsConfigurationUpdater; +import com.facebook.presto.hive.HdfsEnvironment; +import com.facebook.presto.hive.HiveClientConfig; +import com.facebook.presto.hive.HiveClientModule; +import com.facebook.presto.hive.HiveCoercionPolicy; +import com.facebook.presto.hive.HiveConnectorId; +import com.facebook.presto.hive.HiveEventClient; +import com.facebook.presto.hive.HiveFileWriterFactory; +import com.facebook.presto.hive.HiveHdfsConfiguration; +import com.facebook.presto.hive.HiveLocationService; +import com.facebook.presto.hive.HiveMetadataFactory; +import com.facebook.presto.hive.HiveNodePartitioningProvider; +import com.facebook.presto.hive.HivePageSinkProvider; +import com.facebook.presto.hive.HivePageSourceFactory; +import com.facebook.presto.hive.HivePartitionManager; +import com.facebook.presto.hive.HiveRecordCursorProvider; +import com.facebook.presto.hive.HiveSessionProperties; +import com.facebook.presto.hive.HiveSplitManager; +import com.facebook.presto.hive.HiveTableProperties; +import com.facebook.presto.hive.HiveTransactionManager; +import com.facebook.presto.hive.HiveTypeTranslator; +import com.facebook.presto.hive.HiveWriterStats; +import com.facebook.presto.hive.LocationService; +import com.facebook.presto.hive.NamenodeStats; +import com.facebook.presto.hive.OrcFileWriterConfig; +import com.facebook.presto.hive.OrcFileWriterFactory; +import com.facebook.presto.hive.PartitionUpdate; +import com.facebook.presto.hive.RcFileFileWriterFactory; +import com.facebook.presto.hive.TableParameterCodec; +import com.facebook.presto.hive.TransactionalMetadata; +import com.facebook.presto.hive.TypeTranslator; +import com.facebook.presto.hive.orc.DwrfPageSourceFactory; +import com.facebook.presto.hive.orc.OrcPageSourceFactory; +import com.facebook.presto.hive.parquet.ParquetPageSourceFactory; +import com.facebook.presto.hive.parquet.ParquetRecordCursorProvider; +import com.facebook.presto.hive.rcfile.RcFilePageSourceFactory; +import com.facebook.presto.spi.connector.ConnectorNodePartitioningProvider; +import com.facebook.presto.spi.connector.ConnectorPageSinkProvider; import com.facebook.presto.spi.connector.ConnectorPageSourceProvider; import com.facebook.presto.spi.connector.ConnectorSplitManager; -import com.facebook.presto.spi.type.Type; -import com.facebook.presto.spi.type.TypeManager; -import com.fasterxml.jackson.databind.DeserializationContext; -import com.fasterxml.jackson.databind.deser.std.FromStringDeserializer; import com.google.inject.Binder; -import com.google.inject.Module; import com.google.inject.Scopes; +import com.google.inject.TypeLiteral; +import com.google.inject.multibindings.Multibinder; +import io.airlift.event.client.EventClient; -import static com.facebook.presto.spi.type.TypeSignature.parseTypeSignature; -import static com.google.common.base.Preconditions.checkArgument; +import static com.google.inject.multibindings.Multibinder.newSetBinder; import static io.airlift.configuration.ConfigBinder.configBinder; +import static io.airlift.json.JsonCodecBinder.jsonCodecBinder; -public class CarbondataModule implements Module { +import static org.weakref.jmx.ObjectNames.generatedNameOf; +import static org.weakref.jmx.guice.ExportBinder.newExporter; + +public class CarbondataModule extends HiveClientModule { private final String connectorId; - private final TypeManager typeManager; - public CarbondataModule(String connectorId, TypeManager typeManager) { + public CarbondataModule(String connectorId) { + super(connectorId); this.connectorId = requireNonNull(connectorId, "connector id is null"); - this.typeManager = requireNonNull(typeManager, "typeManager is null"); } @Override public void configure(Binder binder) { - binder.bind(TypeManager.class).toInstance(typeManager); + binder.bind(HiveConnectorId.class).toInstance(new HiveConnectorId(connectorId)); + binder.bind(TypeTranslator.class).toInstance(new HiveTypeTranslator()); + binder.bind(CoercionPolicy.class).to(HiveCoercionPolicy.class).in(Scopes.SINGLETON); - binder.bind(CarbondataConnectorId.class).toInstance(new CarbondataConnectorId(connectorId)); - binder.bind(CarbondataMetadata.class).in(Scopes.SINGLETON); - binder.bind(CarbonTableReader.class).in(Scopes.SINGLETON); + binder.bind(HdfsConfigurationUpdater.class).in(Scopes.SINGLETON); + binder.bind(HdfsConfiguration.class).to(HiveHdfsConfiguration.class).in(Scopes.SINGLETON); + binder.bind(HdfsEnvironment.class).in(Scopes.SINGLETON); + binder.bind(DirectoryLister.class).to(HadoopDirectoryLister.class).in(Scopes.SINGLETON); + configBinder(binder).bindConfig(HiveClientConfig.class); + + binder.bind(HiveSessionProperties.class).in(Scopes.SINGLETON); + binder.bind(HiveTableProperties.class).in(Scopes.SINGLETON); + + binder.bind(NamenodeStats.class).in(Scopes.SINGLETON); + newExporter(binder).export(NamenodeStats.class) + .as(generatedNameOf(NamenodeStats.class, connectorId)); + + Multibinder<HiveRecordCursorProvider> recordCursorProviderBinder = + newSetBinder(binder, HiveRecordCursorProvider.class); + recordCursorProviderBinder.addBinding().to(ParquetRecordCursorProvider.class) + .in(Scopes.SINGLETON); + recordCursorProviderBinder.addBinding().to(GenericHiveRecordCursorProvider.class) + .in(Scopes.SINGLETON); + + binder.bind(HiveWriterStats.class).in(Scopes.SINGLETON); + newExporter(binder).export(HiveWriterStats.class) + .as(generatedNameOf(HiveWriterStats.class, connectorId)); + + newSetBinder(binder, EventClient.class).addBinding().to(HiveEventClient.class) + .in(Scopes.SINGLETON); + binder.bind(HivePartitionManager.class).in(Scopes.SINGLETON); + binder.bind(LocationService.class).to(HiveLocationService.class).in(Scopes.SINGLETON); + binder.bind(TableParameterCodec.class).in(Scopes.SINGLETON); + binder.bind(HiveMetadataFactory.class).in(Scopes.SINGLETON); + binder.bind(new TypeLiteral<Supplier<TransactionalMetadata>>() { + }).to(HiveMetadataFactory.class).in(Scopes.SINGLETON); + binder.bind(HiveTransactionManager.class).in(Scopes.SINGLETON); binder.bind(ConnectorSplitManager.class).to(CarbondataSplitManager.class).in(Scopes.SINGLETON); + newExporter(binder).export(ConnectorSplitManager.class) + .as(generatedNameOf(HiveSplitManager.class, connectorId)); binder.bind(ConnectorPageSourceProvider.class).to(CarbondataPageSourceProvider.class) .in(Scopes.SINGLETON); - binder.bind(CarbondataHandleResolver.class).in(Scopes.SINGLETON); - configBinder(binder).bindConfig(CarbonTableConfig.class); - } + binder.bind(ConnectorPageSinkProvider.class).to(HivePageSinkProvider.class) + .in(Scopes.SINGLETON); + binder.bind(ConnectorNodePartitioningProvider.class).to(HiveNodePartitioningProvider.class) + .in(Scopes.SINGLETON); + + jsonCodecBinder(binder).bindJsonCodec(PartitionUpdate.class); - public static final class TypeDeserializer extends FromStringDeserializer<Type> { - private final TypeManager typeManager; + binder.bind(FileFormatDataSourceStats.class).in(Scopes.SINGLETON); + newExporter(binder).export(FileFormatDataSourceStats.class) + .as(generatedNameOf(FileFormatDataSourceStats.class, connectorId)); - @Inject public TypeDeserializer(TypeManager typeManager) { - super(Type.class); - this.typeManager = requireNonNull(typeManager, "typeManager is null"); - } + Multibinder<HivePageSourceFactory> pageSourceFactoryBinder = + newSetBinder(binder, HivePageSourceFactory.class); + pageSourceFactoryBinder.addBinding().to(OrcPageSourceFactory.class).in(Scopes.SINGLETON); + pageSourceFactoryBinder.addBinding().to(DwrfPageSourceFactory.class).in(Scopes.SINGLETON); + pageSourceFactoryBinder.addBinding().to(ParquetPageSourceFactory.class).in(Scopes.SINGLETON); --- End diff -- why orc and parquet PageSource are used here ? can add comment about this ? --- |
Free forum by Nabble | Edit this page |