[jira] [Commented] (CARBONDATA-3472) Carbondata Integration with Presto

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (CARBONDATA-3472) Carbondata Integration with Presto

Akash R Nilugal (Jira)

    [ https://issues.apache.org/jira/browse/CARBONDATA-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895939#comment-16895939 ]

Dibya commented on CARBONDATA-3472:
-----------------------------------

Hi [~Ajantha_Bhat]

As I can see you have executed the table creation & loading through hive2 beeline. So, can you please execute the table creation command through spark-shell like the below :

spark-shell --master yarn --executor-cores 5 --num-executors 3 --executor-memory 45G --jars /home/apache-carbondata-1.5.3-bin-spark2.3.2-hadoop2.7.2.jar --conf spark.dynamicAllocation.enabled=false

Execute the below commands in spark-shell:

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val carbon=SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://namenode-ip/carbondata")
carbon.sql("create table if not exists web_sales ( ws_sold_date_sk int, ws_sold_time_sk int, ws_ship_date_sk int, ws_item_sk int, ws_bill_customer_sk  int, ws_bill_cdemo_sk int, ws_bill_hdemo_sk int, ws_bill_addr_sk int, ws_ship_customer_sk  int, ws_ship_cdemo_sk int, ws_ship_hdemo_sk int, ws_ship_addr_sk int, ws_web_page_sk  int, ws_web_site_sk  int, ws_ship_mode_sk int, ws_warehouse_sk int, ws_promo_sk int, ws_order_number int, ws_quantity int, ws_wholesale_cost double, ws_list_price   double, ws_sales_price  double, ws_ext_discount_amt  double, ws_ext_sales_price   double, ws_ext_wholesale_cost double, ws_ext_list_price double, ws_ext_tax double, ws_coupon_amt   double, ws_ext_ship_cost double, ws_net_paid double, ws_net_paid_inc_tax  double, ws_net_paid_inc_ship double, ws_net_paid_inc_ship_tax  double, ws_net_profit   double) STORED AS carbondata TBLPROPERTIES ('DICTIONARY_INCLUDE'='ws_sold_date_sk, ws_item_sk, ws_quantity', 'INVERTED_INDEX'='ws_sold_date_sk,ws_item_sk,ws_quantity', 'SORT_COLUMNS'='ws_quantity,ws_sold_date_sk, ws_item_sk', 'TABLE_BLOCKSIZE'='128')")
carbon.sql("LOAD DATA INPATH '/data/web_sales/' into table web_sales OPTIONS('Header'='false','DELIMITER'='|')")

Then, can you check if you are able to query through presto.

Please use the attached file to upload data into web_sales table created in the above steps. [^web_sales.txt]

> Carbondata Integration with Presto
> ----------------------------------
>
>                 Key: CARBONDATA-3472
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3472
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-query, presto-integration
>    Affects Versions: 1.6.0
>         Environment: centos 7
>            Reporter: Dibya
>            Priority: Major
>
> Hi,
> I came across the below issue when I was trying to query a table stored in carbondata format through presto:
> java.lang.RuntimeException: Failed to create reader
>  at org.apache.carbondata.presto.CarbondataPageSource.createReaderForColumnar(CarbondataPageSource.java:366)
>  at org.apache.carbondata.presto.CarbondataPageSource.initializeForColumnar(CarbondataPageSource.java:136)
>  at org.apache.carbondata.presto.CarbondataPageSource.initialize(CarbondataPageSource.java:130)
>  at org.apache.carbondata.presto.CarbondataPageSource.<init>(CarbondataPageSource.java:120)
>  at org.apache.carbondata.presto.CarbondataPageSourceProvider.createPageSource(CarbondataPageSourceProvider.java:88)
>  at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:44)
>  at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
>  at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:221)
>  at com.facebook.presto.operator.Driver.processInternal(Driver.java:379)
>  at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:283)
>  at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:675)
>  at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
>  at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
>  at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
>  at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
>  at com.facebook.presto.$gen.Presto_0_217____20190711_064626_1.run(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Last dictionary chunk does not exist
>  at org.apache.carbondata.core.reader.CarbonDictionaryMetadataReaderImpl.readLastEntryOfDictionaryMetaChunk(CarbonDictionaryMetadataReaderImpl.java:115)
>  at org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.readLastChunkFromDictionaryMetadataFile(AbstractDictionaryCache.java:93)
>  at org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.checkAndLoadDictionaryData(AbstractDictionaryCache.java:198)
>  at org.apache.carbondata.core.cache.dictionary.ForwardDictionaryCache.getDictionary(ForwardDictionaryCache.java:212)
>  at org.apache.carbondata.core.cache.dictionary.ForwardDictionaryCache.get(ForwardDictionaryCache.java:80)
>  at org.apache.carbondata.core.cache.dictionary.ForwardDictionaryCache.get(ForwardDictionaryCache.java:45)
>  at org.apache.carbondata.presto.CarbonDictionaryDecodeReadSupport$$anonfun$initialize$1.apply(CarbonDictionaryDecodeReadSupport.scala:65)
>  at org.apache.carbondata.presto.CarbonDictionaryDecodeReadSupport$$anonfun$initialize$1.apply(CarbonDictionaryDecodeReadSupport.scala:53)
>  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>  at org.apache.carbondata.presto.CarbonDictionaryDecodeReadSupport.initialize(CarbonDictionaryDecodeReadSupport.scala:53)
>  at org.apache.carbondata.presto.CarbondataPageSource.createReaderForColumnar(CarbondataPageSource.java:359)
>  ... 18 more
>  
> This issue is seen only while querying a table which has Dictionary created on one of its columns during table creation. The same queries run fine on tables that do not have dictionaries on any of its columns.
> Please look into it.
> Thanks
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)