ajantha-bhat opened a new pull request #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675 ### Why is this PR needed? select query fails when warehouse directory is default (not configured) with below callstak. ``` 0: jdbc:hive2://localhost:10000> create table ab(age int) stored as carbondata; ---------+ Result ---------+ ---------+ No rows selected (0.093 seconds) 0: jdbc:hive2://localhost:10000> select count from ab; Error: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'ab' not found in database 'tpch'; (state=,code=0) caused by java.io.FileNotFoundException: File hdfs://localhost:54311/home/root1/tools/spark-2.3.4-bin-hadoop2.7/spark-warehouse/tpch.db/ab/Metadata does not exist. ``` ### What changes were proposed in this PR? When the spark.sql.warehouse.dir is not configured, default local file system SPARK_HOME is used. But the describe table shows with HDFS prefix in cluster. Reason is we are removing the local filesystem scheme , so when table path is read we add HDFS prefix in cluster. instead if we keep the scheme issue will not come. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No. Happens only in cluster with HDFS or OBS. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601579151 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2518/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601580895 Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/812/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601610898 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601640428 Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/816/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601644675 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2523/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601724117 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/817/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601725797 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2524/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601748528 @QiangCai please check ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
QiangCai commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-602019750 better to find the root cause: where we append the "defaultFS" prefix to store location or database location or table path? at some places, carbon will append the "defaultFS" prefix we need to check whether spark does it also. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-602488252 @QiangCai : I checked. When I do `create table t1(age int) stored as carbondata;` table is stored as hive carbontable. ![Screenshot from 2020-03-23 15-10-56](https://user-images.githubusercontent.com/5889404/77303393-e9f4d200-6d18-11ea-9da1-6435d718d3a7.png) Here `Location` is **given by spark itself**, so spark is adding hdfs prefix, because carbon didn't had any scheme. If you observe `storage properties` in above image, there it is carbon's table location without prefix ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-602508634 ![Screenshot from 2020-03-23 15-55-06](https://user-images.githubusercontent.com/5889404/77307132-ba48c880-6d1e-11ea-97be-7a8c452a2fc6.png) FileMetaStore uses location from catalog table instead of tablepath. so hdfs scheme is added . ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-602510902 parquet adds a scheme while storing. Hence no issue in parquet. After my changes carbon also similar ot parquet ![Screenshot from 2020-03-23 16-00-40](https://user-images.githubusercontent.com/5889404/77307608-96d24d80-6d1f-11ea-8c2f-120bc35f134d.png) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
QiangCai commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-603616337 but if we set "spark.sql.warehouse.dir" to /user/hive/warehouse; in cluster env, it should auto to use "defaultFS" as the prefix of the path, right? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-607099805 Yes, If we set "spark.sql.warehouse.dir" to /home/root1/temp; table is created with hdfs scheme (because cluster is hdfs) ![Screenshot from 2020-04-01 13-34-08](https://user-images.githubusercontent.com/5889404/78113600-9b80bb00-741d-11ea-8390-e3e3b9cb1cf8.png) ![Screenshot from 2020-04-01 13-38-13](https://user-images.githubusercontent.com/5889404/78113903-13e77c00-741e-11ea-9a87-79ded069563e.png) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat edited a comment on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-607099805 Yes, If we set "spark.sql.warehouse.dir" to /home/root1/temp; table is created with hdfs scheme (because cluster is hdfs) ![Screenshot from 2020-04-01 13-34-08](https://user-images.githubusercontent.com/5889404/78113600-9b80bb00-741d-11ea-8390-e3e3b9cb1cf8.png) ![Screenshot from 2020-04-01 13-38-55](https://user-images.githubusercontent.com/5889404/78113987-2f528700-741e-11ea-9411-43feaab9631e.png) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-607348598 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/901/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-607350487 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2610/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-609395273 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
asfgit closed pull request #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |