[jira] [Updated] (CARBONDATA-3949) Select filter query fails from presto-cli on MV table

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-3949) Select filter query fails from presto-cli on MV table

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chetan Bhat updated CARBONDATA-3949:
------------------------------------
    Description:
From sparksql create table , load data and create MV

spark-sql> CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED as carbondata TBLPROPERTIES('local_dictionary_enable'='true','local_dictionary_threshold'='1000');
 Time taken: 0.753 seconds
 spark-sql> LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',', 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
 OK
 OK
 Time taken: 1.992 seconds
 spark-sql> CREATE MATERIALIZED VIEW mv1 as select cust_id, cust_name, count(cust_id) from uniqdata group by cust_id, cust_name;
 OK
 Time taken: 4.336 seconds

 

From presto cli select filter query on table with MV fails.

presto:chetan> select * from uniqdata where CUST_ID IS NULL or BIGINT_COLUMN1 =1233720368578 or DECIMAL_COLUMN1 = 12345678901.1234000058 or Double_COLUMN1 = 1.12345674897976E10 or INTEGER_COLUMN1 IS NULL ;
 Query 20200804_092703_00253_ed34h failed: Unable to get file status:

*Log-*
 2020-08-04T18:09:55.975+0800 INFO Query-20200804_100955_00300_ed34h-2642 stdout 2020-08-04 18:09:55 WARN AbstractDFSCarbonFile:458 - Exception occurred: File hdfs://hacluster/user/sparkhive/warehouse/chetan.db/uniqdata_string/Metadata does not exist.
 java.io.FileNotFoundException: File hdfs://hacluster/user/sparkhive/warehouse/chetan.db/uniqdata_string/Metadata does not exist.
 at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1058)
 at org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
 at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1118)
 at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1115)
 at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1125)
 at org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:270)
 at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.listFiles(AbstractDFSCarbonFile.java:456)
 at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.listFiles(AbstractDFSCarbonFile.java:559)
 at org.apache.carbondata.core.util.path.CarbonTablePath.getActualSchemaFilePath(CarbonTablePath.java:189)
 at org.apache.carbondata.core.util.path.CarbonTablePath.getSchemaFilePath(CarbonTablePath.java:168)
 at org.apache.carbondata.presto.impl.CarbonTableReader.updateSchemaTables(CarbonTableReader.java:147)
 at org.apache.carbondata.presto.impl.CarbonTableReader.getCarbonCache(CarbonTableReader.java:128)
 at org.apache.carbondata.presto.CarbondataSplitManager.getSplits(CarbondataSplitManager.java:145)
 at io.prestosql.spi.connector.classloader.ClassLoaderSafeConnectorSplitManager.getSplits(ClassLoaderSafeConnectorSplitManager.java:50)
 at io.prestosql.split.SplitManager.getSplits(SplitManager.java:85)
 at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitScanAndFilter(DistributedExecutionPlanner.java:189)
 at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitFilter(DistributedExecutionPlanner.java:257)
 at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitFilter(DistributedExecutionPlanner.java:149)
 at io.prestosql.sql.planner.plan.FilterNode.accept(FilterNode.java:72)
 at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:119)
 at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:124)
 at io.prestosql.sql.planner.DistributedExecutionPlanner.plan(DistributedExecutionPlanner.java:96)
 at io.prestosql.execution.SqlQueryExecution.planDistribution(SqlQueryExecution.java:425)
 at io.prestosql.execution.SqlQueryExecution.start(SqlQueryExecution.java:321)
 at io.prestosql.$gen.Presto_316____20200804_042858_1.run(Unknown Source)
 at io.prestosql.execution.SqlQueryManager.createQuery(SqlQueryManager.java:239)
 at io.prestosql.dispatcher.LocalDispatchQuery.lambda$startExecution$4(LocalDispatchQuery.java:105)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)

 

Expected : If the Carbon indexes are not supported for prestoSQL it should be documented in github docs as not supported.

  was:
From sparksql create table , load data and create MV

spark-sql> CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED as carbondata TBLPROPERTIES('local_dictionary_enable'='true','local_dictionary_threshold'='1000');
Time taken: 0.753 seconds
spark-sql> LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',', 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
OK
OK
Time taken: 1.992 seconds
spark-sql> CREATE MATERIALIZED VIEW mv1 as select cust_id, cust_name, count(cust_id) from uniqdata group by cust_id, cust_name;
OK
Time taken: 4.336 seconds

 

From presto cli select filter query on table with MV fails.

presto:chetan> select * from uniqdata where CUST_ID IS NULL or BIGINT_COLUMN1 =1233720368578 or DECIMAL_COLUMN1 = 12345678901.1234000058 or Double_COLUMN1 = 1.12345674897976E10 or INTEGER_COLUMN1 IS NULL ;
Query 20200804_092703_00253_ed34h failed: Unable to get file status:

*Log-*
2020-08-04T18:09:55.975+0800 INFO Query-20200804_100955_00300_ed34h-2642 stdout 2020-08-04 18:09:55 WARN AbstractDFSCarbonFile:458 - Exception occurred: File hdfs://hacluster/user/sparkhive/warehouse/chetan.db/uniqdata_string/Metadata does not exist.
java.io.FileNotFoundException: File hdfs://hacluster/user/sparkhive/warehouse/chetan.db/uniqdata_string/Metadata does not exist.
 at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1058)
 at org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
 at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1118)
 at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1115)
 at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1125)
 at org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:270)
 at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.listFiles(AbstractDFSCarbonFile.java:456)
 at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.listFiles(AbstractDFSCarbonFile.java:559)
 at org.apache.carbondata.core.util.path.CarbonTablePath.getActualSchemaFilePath(CarbonTablePath.java:189)
 at org.apache.carbondata.core.util.path.CarbonTablePath.getSchemaFilePath(CarbonTablePath.java:168)
 at org.apache.carbondata.presto.impl.CarbonTableReader.updateSchemaTables(CarbonTableReader.java:147)
 at org.apache.carbondata.presto.impl.CarbonTableReader.getCarbonCache(CarbonTableReader.java:128)
 at org.apache.carbondata.presto.CarbondataSplitManager.getSplits(CarbondataSplitManager.java:145)
 at io.prestosql.spi.connector.classloader.ClassLoaderSafeConnectorSplitManager.getSplits(ClassLoaderSafeConnectorSplitManager.java:50)
 at io.prestosql.split.SplitManager.getSplits(SplitManager.java:85)
 at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitScanAndFilter(DistributedExecutionPlanner.java:189)
 at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitFilter(DistributedExecutionPlanner.java:257)
 at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitFilter(DistributedExecutionPlanner.java:149)
 at io.prestosql.sql.planner.plan.FilterNode.accept(FilterNode.java:72)
 at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:119)
 at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:124)
 at io.prestosql.sql.planner.DistributedExecutionPlanner.plan(DistributedExecutionPlanner.java:96)
 at io.prestosql.execution.SqlQueryExecution.planDistribution(SqlQueryExecution.java:425)
 at io.prestosql.execution.SqlQueryExecution.start(SqlQueryExecution.java:321)
 at io.prestosql.$gen.Presto_316____20200804_042858_1.run(Unknown Source)
 at io.prestosql.execution.SqlQueryManager.createQuery(SqlQueryManager.java:239)
 at io.prestosql.dispatcher.LocalDispatchQuery.lambda$startExecution$4(LocalDispatchQuery.java:105)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)


> Select filter query fails from presto-cli on MV table
> -----------------------------------------------------
>
>                 Key: CARBONDATA-3949
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3949
>             Project: CarbonData
>          Issue Type: Bug
>          Components: presto-integration
>    Affects Versions: 2.0.1
>         Environment: Spark 2.4.5. PrestoSQL 316
>            Reporter: Chetan Bhat
>            Priority: Major
>
> From sparksql create table , load data and create MV
> spark-sql> CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED as carbondata TBLPROPERTIES('local_dictionary_enable'='true','local_dictionary_threshold'='1000');
>  Time taken: 0.753 seconds
>  spark-sql> LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',', 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
>  OK
>  OK
>  Time taken: 1.992 seconds
>  spark-sql> CREATE MATERIALIZED VIEW mv1 as select cust_id, cust_name, count(cust_id) from uniqdata group by cust_id, cust_name;
>  OK
>  Time taken: 4.336 seconds
>  
> From presto cli select filter query on table with MV fails.
> presto:chetan> select * from uniqdata where CUST_ID IS NULL or BIGINT_COLUMN1 =1233720368578 or DECIMAL_COLUMN1 = 12345678901.1234000058 or Double_COLUMN1 = 1.12345674897976E10 or INTEGER_COLUMN1 IS NULL ;
>  Query 20200804_092703_00253_ed34h failed: Unable to get file status:
> *Log-*
>  2020-08-04T18:09:55.975+0800 INFO Query-20200804_100955_00300_ed34h-2642 stdout 2020-08-04 18:09:55 WARN AbstractDFSCarbonFile:458 - Exception occurred: File hdfs://hacluster/user/sparkhive/warehouse/chetan.db/uniqdata_string/Metadata does not exist.
>  java.io.FileNotFoundException: File hdfs://hacluster/user/sparkhive/warehouse/chetan.db/uniqdata_string/Metadata does not exist.
>  at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1058)
>  at org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
>  at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1118)
>  at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1115)
>  at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1125)
>  at org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:270)
>  at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.listFiles(AbstractDFSCarbonFile.java:456)
>  at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.listFiles(AbstractDFSCarbonFile.java:559)
>  at org.apache.carbondata.core.util.path.CarbonTablePath.getActualSchemaFilePath(CarbonTablePath.java:189)
>  at org.apache.carbondata.core.util.path.CarbonTablePath.getSchemaFilePath(CarbonTablePath.java:168)
>  at org.apache.carbondata.presto.impl.CarbonTableReader.updateSchemaTables(CarbonTableReader.java:147)
>  at org.apache.carbondata.presto.impl.CarbonTableReader.getCarbonCache(CarbonTableReader.java:128)
>  at org.apache.carbondata.presto.CarbondataSplitManager.getSplits(CarbondataSplitManager.java:145)
>  at io.prestosql.spi.connector.classloader.ClassLoaderSafeConnectorSplitManager.getSplits(ClassLoaderSafeConnectorSplitManager.java:50)
>  at io.prestosql.split.SplitManager.getSplits(SplitManager.java:85)
>  at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitScanAndFilter(DistributedExecutionPlanner.java:189)
>  at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitFilter(DistributedExecutionPlanner.java:257)
>  at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitFilter(DistributedExecutionPlanner.java:149)
>  at io.prestosql.sql.planner.plan.FilterNode.accept(FilterNode.java:72)
>  at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:119)
>  at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:124)
>  at io.prestosql.sql.planner.DistributedExecutionPlanner.plan(DistributedExecutionPlanner.java:96)
>  at io.prestosql.execution.SqlQueryExecution.planDistribution(SqlQueryExecution.java:425)
>  at io.prestosql.execution.SqlQueryExecution.start(SqlQueryExecution.java:321)
>  at io.prestosql.$gen.Presto_316____20200804_042858_1.run(Unknown Source)
>  at io.prestosql.execution.SqlQueryManager.createQuery(SqlQueryManager.java:239)
>  at io.prestosql.dispatcher.LocalDispatchQuery.lambda$startExecution$4(LocalDispatchQuery.java:105)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Expected : If the Carbon indexes are not supported for prestoSQL it should be documented in github docs as not supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)