Using Carbondata tables stored on S3 with EMR presto

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Using Carbondata tables stored on S3 with EMR presto

Charlie Horrell
Hi,

We're currently having some issues with accessing Carbondata tables stored
on S3 using the EMR version of presto. I've logged a JIRA for this under
ticket CARBONDATA-3234

We've successfully queried through Presto when we've stored the Carbondata
tables on HDFS rather than in S3 and we've successfully queried the
carbondata tables stored on S3 when using spark however we need to query
the carbondata tables in S3 through presto which currently throws the
following stack:

2019-01-07T12:19:57.562Z WARN statement-response-4
com.facebook.presto.server.ThrowableMapper Request failed for
/v1/statement/20190107_121957_00004_k6t5p/1
java.lang.IllegalAccessError: tried to access method
org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V
from class org.apache.hadoop.fs.s3a.S3AInstrumentation
at
org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:194)
at
org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:216)
at
org.apache.hadoop.fs.s3a.S3AInstrumentation.<init>(S3AInstrumentation.java:139)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:174)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.<init>(AbstractDFSCarbonFile.java:74)
at
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.<init>(AbstractDFSCarbonFile.java:66)
at
org.apache.carbondata.core.datastore.filesystem.HDFSCarbonFile.<init>(HDFSCarbonFile.java:41)
at
org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.<init>(S3CarbonFile.java:41)
at
org.apache.carbondata.core.datastore.impl.DefaultFileTypeProvider.getCarbonFile(DefaultFileTypeProvider.java:53)
at
org.apache.carbondata.core.datastore.impl.FileFactory.getCarbonFile(FileFactory.java:102)
at
org.apache.carbondata.presto.impl.CarbonTableReader.updateCarbonFile(CarbonTableReader.java:202)
at
org.apache.carbondata.presto.impl.CarbonTableReader.updateSchemaList(CarbonTableReader.java:216)
at
org.apache.carbondata.presto.impl.CarbonTableReader.getSchemaNames(CarbonTableReader.java:189)
at
org.apache.carbondata.presto.CarbondataMetadata.listSchemaNamesInternal(CarbondataMetadata.java:86)
at
org.apache.carbondata.presto.CarbondataMetadata.getTableMetadata(CarbondataMetadata.java:135)
at
org.apache.carbondata.presto.CarbondataMetadata.getTableMetadataInternal(CarbondataMetadata.java:240)
at
org.apache.carbondata.presto.CarbondataMetadata.getTableMetadata(CarbondataMetadata.java:232)
at
com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.getTableMetadata(ClassLoaderSafeConnectorMetadata.java:145)
at
com.facebook.presto.metadata.MetadataManager.getTableMetadata(MetadataManager.java:388)
at
com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:850)
at
com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:258)
at com.facebook.presto.sql.tree.Table.accept(Table.java:53)
at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27)
at
com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:270)
at
com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.analyzeFrom(StatementAnalyzer.java:1772)
at
com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:954)
at
com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:258)
at
com.facebook.presto.sql.tree.QuerySpecification.accept(QuerySpecification.java:127)
at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27)
at
com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:270)
at
com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:280)
at
com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:676)
at
com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:258)
at com.facebook.presto.sql.tree.Query.accept(Query.java:94)
at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27)
at
com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:270)
at
com.facebook.presto.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:244)
at com.facebook.presto.sql.analyzer.Analyzer.analyze(Analyzer.java:72)
at com.facebook.presto.sql.analyzer.Analyzer.analyze(Analyzer.java:64)
at
com.facebook.presto.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:176)
at
com.facebook.presto.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:707)
at
com.facebook.presto.execution.SqlQueryManager.createQueryInternal(SqlQueryManager.java:449)
at
com.facebook.presto.execution.SqlQueryManager.lambda$createQuery$3(SqlQueryManager.java:382)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


Our hypothesis is that the error is to do with conflicting versions of the
hadoop/aws jars that are used by the EMR cluster and the carbondata presto
integration plugin.

Has anyone here previously managed to get EMR Presto to successfully query
a Carbondata table stored in S3?

Thanks
Charlie