[GitHub] [carbondata] kevinjmh opened a new pull request #3780: [HOTFIX] Fix carbon store path

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kevinjmh opened a new pull request #3780: [HOTFIX] Fix carbon store path

GitBox

kevinjmh opened a new pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780


    ### Why is this PR needed?
    In the case of using carbon only setting `carbon.storelocation`, carbon will use local spark warehouse path instead of the configured value.
   
    ### What changes were proposed in this PR?
   `spark.sql.warehouse.dir` has its own default value in Spark, do not use `carbonStorePath` as default value, which will make `hiveStorePath.equals(carbonStorePath)` TRUE when user not set `spark.sql.warehouse.dir`
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3780: [HOTFIX] Fix carbon store path

GitBox

CarbonDataQA1 commented on pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#issuecomment-636278462


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3092/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3780: [HOTFIX] Fix carbon store path

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#issuecomment-636278570


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1372/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3780: [HOTFIX] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#issuecomment-636312576


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1375/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3780: [HOTFIX] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#issuecomment-636312777


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3097/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on pull request #3780: [HOTFIX] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

kunal642 commented on pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#issuecomment-636435451


   Please create a JIRA for this


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kevinjmh commented on pull request #3780: [CARBONDATA-3836] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

kevinjmh commented on pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#issuecomment-636454529


   > Please create a JIRA for this
   
   OK


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on a change in pull request #3780: [CARBONDATA-3836] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

QiangCai commented on a change in pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#discussion_r432938586



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
##########
@@ -328,7 +328,7 @@ object CarbonEnv {
     if ((!EnvHelper.isLegacy(sparkSession)) &&
         (dbName.equals("default") || databaseLocation.endsWith(".db"))) {
       val carbonStorePath = CarbonProperties.getStorePath()
-      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath)
+      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir")
       // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path
       // format
       if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) {

Review comment:
       need check hivestorePath is null or not




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on a change in pull request #3780: [CARBONDATA-3836] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

QiangCai commented on a change in pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#discussion_r432938586



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
##########
@@ -328,7 +328,7 @@ object CarbonEnv {
     if ((!EnvHelper.isLegacy(sparkSession)) &&
         (dbName.equals("default") || databaseLocation.endsWith(".db"))) {
       val carbonStorePath = CarbonProperties.getStorePath()
-      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath)
+      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir")
       // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path
       // format
       if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) {

Review comment:
       need check hiveStorePath  is null or not




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kevinjmh commented on a change in pull request #3780: [CARBONDATA-3836] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

kevinjmh commented on a change in pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#discussion_r432938670



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
##########
@@ -328,7 +328,7 @@ object CarbonEnv {
     if ((!EnvHelper.isLegacy(sparkSession)) &&
         (dbName.equals("default") || databaseLocation.endsWith(".db"))) {
       val carbonStorePath = CarbonProperties.getStorePath()
-      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath)
+      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir")
       // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path
       // format
       if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) {

Review comment:
       no need. Spark conf has default value




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on a change in pull request #3780: [CARBONDATA-3836] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

QiangCai commented on a change in pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#discussion_r432944937



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
##########
@@ -328,7 +328,7 @@ object CarbonEnv {
     if ((!EnvHelper.isLegacy(sparkSession)) &&
         (dbName.equals("default") || databaseLocation.endsWith(".db"))) {
       val carbonStorePath = CarbonProperties.getStorePath()
-      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath)
+      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir")
       // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path
       // format
       if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) {

Review comment:
       ok, can you check this pr: https://github.com/apache/carbondata/pull/3675




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kevinjmh commented on a change in pull request #3780: [CARBONDATA-3836] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

kevinjmh commented on a change in pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#discussion_r432950129



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
##########
@@ -328,7 +328,7 @@ object CarbonEnv {
     if ((!EnvHelper.isLegacy(sparkSession)) &&
         (dbName.equals("default") || databaseLocation.endsWith(".db"))) {
       val carbonStorePath = CarbonProperties.getStorePath()
-      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath)
+      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir")
       // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path
       // format
       if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) {

Review comment:
       the problem of that PR need checking of the cluster setting. I haven't met the case local path is added hdfs prefix unconsciously.
   In my case, a vanilla apache version of spark on yarn, the default warehouse location is under the local working directory where the thrift server launch.
   
   If you check spark code, the default value of `spark.sql.warehouse.dir` barely comes from Java URI implementation.
   https://github.com/apache/spark/blob/6c792a79c10e7b01bd040ef14c848a2a2378e28c/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala#L33-L37
   
   https://github.com/apache/spark/blob/47dc332258bec20c460f666de50d9a8c5c0fbc0a/core/src/main/scala/org/apache/spark/util/Utils.scala#L1976
   
   ![image](https://user-images.githubusercontent.com/3809732/83354246-fab16c00-a389-11ea-96ca-48e28ebe799c.png)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kevinjmh commented on a change in pull request #3780: [CARBONDATA-3836] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

kevinjmh commented on a change in pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#discussion_r432951870



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
##########
@@ -328,7 +328,7 @@ object CarbonEnv {
     if ((!EnvHelper.isLegacy(sparkSession)) &&
         (dbName.equals("default") || databaseLocation.endsWith(".db"))) {
       val carbonStorePath = CarbonProperties.getStorePath()
-      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath)
+      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir")
       // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path
       // format
       if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) {

Review comment:
       I think  he can try adding an empty carbon style OPTIONS in the create statement :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kevinjmh commented on a change in pull request #3780: [CARBONDATA-3836] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

kevinjmh commented on a change in pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#discussion_r432951870



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
##########
@@ -328,7 +328,7 @@ object CarbonEnv {
     if ((!EnvHelper.isLegacy(sparkSession)) &&
         (dbName.equals("default") || databaseLocation.endsWith(".db"))) {
       val carbonStorePath = CarbonProperties.getStorePath()
-      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath)
+      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir")
       // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path
       // format
       if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) {

Review comment:
       I think  he can try adding an empty carbon style option TBLPROPERTIES in the create statement :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kevinjmh commented on a change in pull request #3780: [CARBONDATA-3836] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

kevinjmh commented on a change in pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#discussion_r432951870



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
##########
@@ -328,7 +328,7 @@ object CarbonEnv {
     if ((!EnvHelper.isLegacy(sparkSession)) &&
         (dbName.equals("default") || databaseLocation.endsWith(".db"))) {
       val carbonStorePath = CarbonProperties.getStorePath()
-      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath)
+      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir")
       // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path
       // format
       if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) {

Review comment:
       I think  he can try adding an empty carbon style option TBLPROPERTIES in the create statement :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on pull request #3780: [CARBONDATA-3836] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

QiangCai commented on pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#issuecomment-636582665


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kevinjmh commented on a change in pull request #3780: [CARBONDATA-3836] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

kevinjmh commented on a change in pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780#discussion_r432950129



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
##########
@@ -328,7 +328,7 @@ object CarbonEnv {
     if ((!EnvHelper.isLegacy(sparkSession)) &&
         (dbName.equals("default") || databaseLocation.endsWith(".db"))) {
       val carbonStorePath = CarbonProperties.getStorePath()
-      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath)
+      val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir")
       // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path
       // format
       if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) {

Review comment:
       In my case, a vanilla apache version of spark on yarn, the default warehouse location is under the local working directory where the thrift server launch.
   
   If you check spark code, the default value of `spark.sql.warehouse.dir` barely comes from Java URI implementation.
   
   https://github.com/apache/spark/blob/6c792a79c10e7b01bd040ef14c848a2a2378e28c/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala#L33-L37
   
   https://github.com/apache/spark/blob/47dc332258bec20c460f666de50d9a8c5c0fbc0a/core/src/main/scala/org/apache/spark/util/Utils.scala#L1976
   
   ![image](https://user-images.githubusercontent.com/3809732/83354246-fab16c00-a389-11ea-96ca-48e28ebe799c.png)
   
   
   the problem of PR #3675 is :
   1. carbon store path is not taken effect, default database warehouse URI (local fs, file://...) is used. <---  this pr fix this
   2. the scheme of default database warehouse URI is removed by `FileFactiry.getUpdatedFilePath` in `CarbonEnv.getDatabaseLocation`. And this is not an absolute URI string, so spark will make
    qualified path in `SessionCatalog#createTable`. <--  #3675 fix this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3780: [CARBONDATA-3836] Fix carbon store path & avoid exception when creating new carbon table

GitBox
In reply to this post by GitBox

asfgit closed pull request #3780:
URL: https://github.com/apache/carbondata/pull/3780


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]