kevinjmh opened a new pull request #3780: URL: https://github.com/apache/carbondata/pull/3780 ### Why is this PR needed? In the case of using carbon only setting `carbon.storelocation`, carbon will use local spark warehouse path instead of the configured value. ### What changes were proposed in this PR? `spark.sql.warehouse.dir` has its own default value in Spark, do not use `carbonStorePath` as default value, which will make `hiveStorePath.equals(carbonStorePath)` TRUE when user not set `spark.sql.warehouse.dir` ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA1 commented on pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#issuecomment-636278462 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3092/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#issuecomment-636278570 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1372/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#issuecomment-636312576 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1375/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#issuecomment-636312777 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3097/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kunal642 commented on pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#issuecomment-636435451 Please create a JIRA for this ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kevinjmh commented on pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#issuecomment-636454529 > Please create a JIRA for this OK ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on a change in pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#discussion_r432938586 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ########## @@ -328,7 +328,7 @@ object CarbonEnv { if ((!EnvHelper.isLegacy(sparkSession)) && (dbName.equals("default") || databaseLocation.endsWith(".db"))) { val carbonStorePath = CarbonProperties.getStorePath() - val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath) + val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir") // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path // format if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) { Review comment: need check hivestorePath is null or not ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on a change in pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#discussion_r432938586 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ########## @@ -328,7 +328,7 @@ object CarbonEnv { if ((!EnvHelper.isLegacy(sparkSession)) && (dbName.equals("default") || databaseLocation.endsWith(".db"))) { val carbonStorePath = CarbonProperties.getStorePath() - val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath) + val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir") // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path // format if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) { Review comment: need check hiveStorePath is null or not ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kevinjmh commented on a change in pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#discussion_r432938670 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ########## @@ -328,7 +328,7 @@ object CarbonEnv { if ((!EnvHelper.isLegacy(sparkSession)) && (dbName.equals("default") || databaseLocation.endsWith(".db"))) { val carbonStorePath = CarbonProperties.getStorePath() - val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath) + val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir") // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path // format if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) { Review comment: no need. Spark conf has default value ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on a change in pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#discussion_r432944937 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ########## @@ -328,7 +328,7 @@ object CarbonEnv { if ((!EnvHelper.isLegacy(sparkSession)) && (dbName.equals("default") || databaseLocation.endsWith(".db"))) { val carbonStorePath = CarbonProperties.getStorePath() - val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath) + val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir") // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path // format if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) { Review comment: ok, can you check this pr: https://github.com/apache/carbondata/pull/3675 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kevinjmh commented on a change in pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#discussion_r432950129 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ########## @@ -328,7 +328,7 @@ object CarbonEnv { if ((!EnvHelper.isLegacy(sparkSession)) && (dbName.equals("default") || databaseLocation.endsWith(".db"))) { val carbonStorePath = CarbonProperties.getStorePath() - val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath) + val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir") // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path // format if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) { Review comment: the problem of that PR need checking of the cluster setting. I haven't met the case local path is added hdfs prefix unconsciously. In my case, a vanilla apache version of spark on yarn, the default warehouse location is under the local working directory where the thrift server launch. If you check spark code, the default value of `spark.sql.warehouse.dir` barely comes from Java URI implementation. https://github.com/apache/spark/blob/6c792a79c10e7b01bd040ef14c848a2a2378e28c/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala#L33-L37 https://github.com/apache/spark/blob/47dc332258bec20c460f666de50d9a8c5c0fbc0a/core/src/main/scala/org/apache/spark/util/Utils.scala#L1976 ![image](https://user-images.githubusercontent.com/3809732/83354246-fab16c00-a389-11ea-96ca-48e28ebe799c.png) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kevinjmh commented on a change in pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#discussion_r432951870 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ########## @@ -328,7 +328,7 @@ object CarbonEnv { if ((!EnvHelper.isLegacy(sparkSession)) && (dbName.equals("default") || databaseLocation.endsWith(".db"))) { val carbonStorePath = CarbonProperties.getStorePath() - val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath) + val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir") // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path // format if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) { Review comment: I think he can try adding an empty carbon style OPTIONS in the create statement :) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kevinjmh commented on a change in pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#discussion_r432951870 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ########## @@ -328,7 +328,7 @@ object CarbonEnv { if ((!EnvHelper.isLegacy(sparkSession)) && (dbName.equals("default") || databaseLocation.endsWith(".db"))) { val carbonStorePath = CarbonProperties.getStorePath() - val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath) + val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir") // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path // format if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) { Review comment: I think he can try adding an empty carbon style option TBLPROPERTIES in the create statement :) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kevinjmh commented on a change in pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#discussion_r432951870 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ########## @@ -328,7 +328,7 @@ object CarbonEnv { if ((!EnvHelper.isLegacy(sparkSession)) && (dbName.equals("default") || databaseLocation.endsWith(".db"))) { val carbonStorePath = CarbonProperties.getStorePath() - val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath) + val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir") // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path // format if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) { Review comment: I think he can try adding an empty carbon style option TBLPROPERTIES in the create statement :) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#issuecomment-636582665 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kevinjmh commented on a change in pull request #3780: URL: https://github.com/apache/carbondata/pull/3780#discussion_r432950129 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ########## @@ -328,7 +328,7 @@ object CarbonEnv { if ((!EnvHelper.isLegacy(sparkSession)) && (dbName.equals("default") || databaseLocation.endsWith(".db"))) { val carbonStorePath = CarbonProperties.getStorePath() - val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir", carbonStorePath) + val hiveStorePath = sparkSession.conf.get("spark.sql.warehouse.dir") // if carbon.store does not point to spark.sql.warehouse.dir then follow the old table path // format if (carbonStorePath != null && !hiveStorePath.equals(carbonStorePath)) { Review comment: In my case, a vanilla apache version of spark on yarn, the default warehouse location is under the local working directory where the thrift server launch. If you check spark code, the default value of `spark.sql.warehouse.dir` barely comes from Java URI implementation. https://github.com/apache/spark/blob/6c792a79c10e7b01bd040ef14c848a2a2378e28c/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala#L33-L37 https://github.com/apache/spark/blob/47dc332258bec20c460f666de50d9a8c5c0fbc0a/core/src/main/scala/org/apache/spark/util/Utils.scala#L1976 ![image](https://user-images.githubusercontent.com/3809732/83354246-fab16c00-a389-11ea-96ca-48e28ebe799c.png) the problem of PR #3675 is : 1. carbon store path is not taken effect, default database warehouse URI (local fs, file://...) is used. <--- this pr fix this 2. the scheme of default database warehouse URI is removed by `FileFactiry.getUpdatedFilePath` in `CarbonEnv.getDatabaseLocation`. And this is not an absolute URI string, so spark will make qualified path in `SessionCatalog#createTable`. <-- #3675 fix this ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
asfgit closed pull request #3780: URL: https://github.com/apache/carbondata/pull/3780 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |