GitHub user jatin9896 opened a pull request:
https://github.com/apache/carbondata/pull/1860 [CARBONDATA-2080] [S3-Implementation] Propagated hadoopConf from driver to executor for s3 implementation in cluster mode. Problem : hadoopconf was not getting propagated from driver to the executor that's why load was failing to the distributed environment. Solution: Setting the Hadoop conf in base class CarbonRDD How to verify this PR : Execute the load in the cluster mode It should be a success using location s3. Be sure to do all of the following checklists to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? No - How it is tested? Please attach test report. Testing is done Manually - Is it a performance related change? Please attach the performance test report. No - Any additional information to help reviewers in testing this change. No - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jatin9896/incubator-carbondata CARBONDATA-2080 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1860.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1860 ---- commit a310a9a5a6d230501bfd26d7c3605791638a1860 Author: Jatin <jatin.demla@...> Date: 2018-01-25T11:23:00Z Propagated hadoopConf from driver to executor for s3 implementation ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1860 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1896/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1860 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3129/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1860 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3106/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1860#discussion_r164649657 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonRDD.scala --- @@ -31,7 +37,7 @@ import org.apache.carbondata.core.util._ */ abstract class CarbonRDD[T: ClassTag](@transient sc: SparkContext, @transient private var deps: Seq[Dependency[_]]) extends RDD[T](sc, deps) { - + @transient val hadoopConf: Configuration = sc.hadoopConfiguration --- End diff -- Please add comment for this variable --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1860#discussion_r164649933 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonRDD.scala --- @@ -59,6 +74,33 @@ abstract class CarbonRDD[T: ClassTag](@transient sc: SparkContext, map(f => CarbonProperties.getInstance().addProperty(f._1, f._2)) internalCompute(split, context) } + + private def getConf = { --- End diff -- provide return value type in function signature --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1860#discussion_r164650047 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonRDD.scala --- @@ -59,6 +74,33 @@ abstract class CarbonRDD[T: ClassTag](@transient sc: SparkContext, map(f => CarbonProperties.getInstance().addProperty(f._1, f._2)) internalCompute(split, context) } + + private def getConf = { --- End diff -- Move this up to line 59, and provide comment for `confBytes` and `getConf` --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1860#discussion_r164651172 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonRDD.scala --- @@ -31,7 +37,7 @@ import org.apache.carbondata.core.util._ */ abstract class CarbonRDD[T: ClassTag](@transient sc: SparkContext, --- End diff -- Please change `sc:SparkContext` to `sparkSession:SparkSession`. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1860#discussion_r164651488 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonRDD.scala --- @@ -31,7 +37,7 @@ import org.apache.carbondata.core.util._ */ abstract class CarbonRDD[T: ClassTag](@transient sc: SparkContext, @transient private var deps: Seq[Dependency[_]]) extends RDD[T](sc, deps) { - + @transient val hadoopConf: Configuration = sc.hadoopConfiguration --- End diff -- This hadoopConf should be passed via constructor, so add a new parameter in CarbonRDD constructor: `@transient hadoopConf: Configuration` And modify all subclass of CarbonRDD to pass this parameter. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1860#discussion_r164651950 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala --- @@ -347,31 +347,10 @@ class NewDataFrameLoaderRDD[K, V]( sc: SparkContext, result: DataLoadResult[K, V], carbonLoadModel: CarbonLoadModel, - prev: DataLoadCoalescedRDD[Row], - @transient hadoopConf: Configuration) extends CarbonRDD[(K, V)](prev) { --- End diff -- This hadoopConf should be moved to CarbonRDD. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1860 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3244/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1860 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3334/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1860 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2098/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1860 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3354/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1860 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2117/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1860 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3367/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1860 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3260/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1860 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2131/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1860#discussion_r165261635 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonRDD.scala --- @@ -59,6 +76,33 @@ abstract class CarbonRDD[T: ClassTag](@transient sc: SparkContext, map(f => CarbonProperties.getInstance().addProperty(f._1, f._2)) internalCompute(split, context) } + + private def getConf: Configuration = { + val configuration = new Configuration(false) + val bai = new ByteArrayInputStream(CompressorFactory.getInstance().getCompressor + .unCompressByte(confBytes)) + val ois = new ObjectInputStream(bai) + configuration.readFields(ois) + ois.close() + configuration + } + + private def setS3Configurations(hadoopConf: Configuration): Unit = { --- End diff -- Can you use the same function in CarbonInputFormatUtil. It was added when I rebase yesterday --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1860 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2158/ --- |
Free forum by Nabble | Edit this page |