[GitHub] [carbondata] ajantha-bhat opened a new pull request #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
ajantha-bhat opened a new pull request #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675
 
 
    ### Why is this PR needed?
   select query fails when warehouse directory is default (not configured) with below callstak.
   
   ```
   0: jdbc:hive2://localhost:10000> create table ab(age int) stored as carbondata;
   ---------+
   Result
   ---------+
   ---------+
   No rows selected (0.093 seconds)
   0: jdbc:hive2://localhost:10000> select count from ab;
   Error: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'ab' not found in database 'tpch'; (state=,code=0)
   
   caused by
   java.io.FileNotFoundException: File hdfs://localhost:54311/home/root1/tools/spark-2.3.4-bin-hadoop2.7/spark-warehouse/tpch.db/ab/Metadata does not exist.
   ```
   
    ### What changes were proposed in this PR?
   When the spark.sql.warehouse.dir is not configured, default local file system SPARK_HOME is used. But the describe table shows with HDFS prefix in cluster.
   
   Reason is we are removing the local filesystem scheme , so when table path is read we add HDFS prefix in cluster. instead if we keep the scheme issue will not come.    
   
   
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No. Happens only in cluster with HDFS or OBS.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601579151
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2518/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601580895
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/812/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601610898
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601640428
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/816/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601644675
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2523/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601724117
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/817/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601725797
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2524/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-601748528
 
 
   @QiangCai  please check

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
QiangCai commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-602019750
 
 
   better  to find the root cause:
   where we append the "defaultFS" prefix to store location or database location or table path?
   
   at some places, carbon will append the "defaultFS" prefix
   we need to check whether spark does it also.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-602488252
 
 
   @QiangCai : I checked.
   
   When I do `create table t1(age int) stored as carbondata;` table is stored as hive carbontable.
   ![Screenshot from 2020-03-23 15-10-56](https://user-images.githubusercontent.com/5889404/77303393-e9f4d200-6d18-11ea-9da1-6435d718d3a7.png)
   
   Here `Location` is **given by spark itself**, so spark is adding hdfs prefix, because carbon didn't had any scheme.
   If you observe `storage properties` in above image, there it is carbon's table location without prefix
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-602508634
 
 
   ![Screenshot from 2020-03-23 15-55-06](https://user-images.githubusercontent.com/5889404/77307132-ba48c880-6d1e-11ea-97be-7a8c452a2fc6.png)
   
   FileMetaStore uses location from catalog table instead of tablepath. so hdfs scheme is added .

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-602510902
 
 
   parquet adds a scheme while storing. Hence no issue in parquet. After my changes carbon also similar ot parquet
   ![Screenshot from 2020-03-23 16-00-40](https://user-images.githubusercontent.com/5889404/77307608-96d24d80-6d1f-11ea-8c2f-120bc35f134d.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
QiangCai commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-603616337
 
 
   but if we set "spark.sql.warehouse.dir" to /user/hive/warehouse;
   in cluster env, it should auto to use "defaultFS" as the prefix of the path, right?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-607099805
 
 
   Yes, If we set "spark.sql.warehouse.dir" to /home/root1/temp;
   table is created with hdfs scheme (because cluster is hdfs)
   
   ![Screenshot from 2020-04-01 13-34-08](https://user-images.githubusercontent.com/5889404/78113600-9b80bb00-741d-11ea-8390-e3e3b9cb1cf8.png)
   
   ![Screenshot from 2020-04-01 13-38-13](https://user-images.githubusercontent.com/5889404/78113903-13e77c00-741e-11ea-9a87-79ded069563e.png)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat edited a comment on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
ajantha-bhat edited a comment on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-607099805
 
 
   Yes, If we set "spark.sql.warehouse.dir" to /home/root1/temp;
   table is created with hdfs scheme (because cluster is hdfs)
   
   ![Screenshot from 2020-04-01 13-34-08](https://user-images.githubusercontent.com/5889404/78113600-9b80bb00-741d-11ea-8390-e3e3b9cb1cf8.png)
   
   ![Screenshot from 2020-04-01 13-38-55](https://user-images.githubusercontent.com/5889404/78113987-2f528700-741e-11ea-9411-43feaab9631e.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-607348598
 
 
   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/901/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-607350487
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2610/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
jackylk commented on issue #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675#issuecomment-609395273
 
 
   LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster

GitBox
In reply to this post by GitBox
asfgit closed pull request #3675: [CARBONDATA-3744] Fix select query failure issue when warehouse directory is default (not configured) in cluster
URL: https://github.com/apache/carbondata/pull/3675
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services