[jira] [Commented] (CARBONDATA-3472) Carbondata Integration with Presto

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (CARBONDATA-3472) Carbondata Integration with Presto

Akash R Nilugal (Jira)

    [ https://issues.apache.org/jira/browse/CARBONDATA-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889959#comment-16889959 ]

Dibya commented on CARBONDATA-3472:
-----------------------------------

Hi,

Please follow the below steps to reproduce this issue:

1. Install spark.
2. Maven build carbondata - https://github.com/apache/carbondata/tree/master/build
3. Install Presto 0.217 & integrate with carbondata following steps on - https://github.com/apache/carbondata/blob/master/docs/presto-guide.md
4. Place the below properties in carbondata.properties file :
connector.name=carbondata
enable.unsafe.in.query.processing=false
enable.unsafe.sort=false
enable.unsafe.columnpage=false
hive.metastore.uri=thrift://<thrift server IP>:<thrift server port>

(Presto requires hive metastore service to be running in order to connect to metastore)

5. Create a carbondata table which has dictionary included in it, like the below (through Spark) -
carbon.sql("create table inventory ( inv_date_sk int, inv_item_sk int, inv_warehouse_sk int, inv_quantity_on_hand bigint) STORED AS carbondata TBLPROPERTIES ('NO_INVERTED_INDEX'='inv_date_sk , inv_item_sk , inv_warehouse_sk , inv_quantity_on_hand','INVERTED_INDEX'='inv_date_sk , inv_item_sk , inv_warehouse_sk , inv_quantity_on_hand','SORT_COLUMNS'='inv_date_sk , inv_item_sk , inv_warehouse_sk , inv_quantity_on_hand','DICTIONARY_INCLUDE'='inv_date_sk , inv_item_sk , inv_warehouse_sk , inv_quantity_on_hand', 'TABLE_BLOCKSIZE'='128')")
5. Query on the above created table through presto-CLI using carbondata catalog - select * from inventory limit 20;

The select query through presto will generate an error as stated in the issue.

Please let me know if you are unable to follow the steps at any point.

Thanks


> Carbondata Integration with Presto
> ----------------------------------
>
>                 Key: CARBONDATA-3472
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3472
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-query, presto-integration
>    Affects Versions: 1.6.0
>         Environment: centos 7
>            Reporter: Dibya
>            Priority: Major
>
> Hi,
> I came across the below issue when I was trying to query a table stored in carbondata format through presto:
> java.lang.RuntimeException: Failed to create reader
>  at org.apache.carbondata.presto.CarbondataPageSource.createReaderForColumnar(CarbondataPageSource.java:366)
>  at org.apache.carbondata.presto.CarbondataPageSource.initializeForColumnar(CarbondataPageSource.java:136)
>  at org.apache.carbondata.presto.CarbondataPageSource.initialize(CarbondataPageSource.java:130)
>  at org.apache.carbondata.presto.CarbondataPageSource.<init>(CarbondataPageSource.java:120)
>  at org.apache.carbondata.presto.CarbondataPageSourceProvider.createPageSource(CarbondataPageSourceProvider.java:88)
>  at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:44)
>  at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
>  at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:221)
>  at com.facebook.presto.operator.Driver.processInternal(Driver.java:379)
>  at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:283)
>  at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:675)
>  at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
>  at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
>  at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
>  at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
>  at com.facebook.presto.$gen.Presto_0_217____20190711_064626_1.run(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Last dictionary chunk does not exist
>  at org.apache.carbondata.core.reader.CarbonDictionaryMetadataReaderImpl.readLastEntryOfDictionaryMetaChunk(CarbonDictionaryMetadataReaderImpl.java:115)
>  at org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.readLastChunkFromDictionaryMetadataFile(AbstractDictionaryCache.java:93)
>  at org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.checkAndLoadDictionaryData(AbstractDictionaryCache.java:198)
>  at org.apache.carbondata.core.cache.dictionary.ForwardDictionaryCache.getDictionary(ForwardDictionaryCache.java:212)
>  at org.apache.carbondata.core.cache.dictionary.ForwardDictionaryCache.get(ForwardDictionaryCache.java:80)
>  at org.apache.carbondata.core.cache.dictionary.ForwardDictionaryCache.get(ForwardDictionaryCache.java:45)
>  at org.apache.carbondata.presto.CarbonDictionaryDecodeReadSupport$$anonfun$initialize$1.apply(CarbonDictionaryDecodeReadSupport.scala:65)
>  at org.apache.carbondata.presto.CarbonDictionaryDecodeReadSupport$$anonfun$initialize$1.apply(CarbonDictionaryDecodeReadSupport.scala:53)
>  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>  at org.apache.carbondata.presto.CarbonDictionaryDecodeReadSupport.initialize(CarbonDictionaryDecodeReadSupport.scala:53)
>  at org.apache.carbondata.presto.CarbondataPageSource.createReaderForColumnar(CarbondataPageSource.java:359)
>  ... 18 more
>  
> This issue is seen only while querying a table which has Dictionary created on one of its columns during table creation. The same queries run fine on tables that do not have dictionaries on any of its columns.
> Please look into it.
> Thanks
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)