Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2715 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/531/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2715 @ravipesala please review this PR --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2715#discussion_r226274512 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala --- @@ -42,6 +44,112 @@ case class Rcd(booleanField: Boolean, shortField: Short, intField: Int, bigintFi dateField: String, charField: String, floatField: Float, stringDictField: String, stringSortField: String, stringLocalDictField: String, longStringField: String) +/** + * This compressor actually will not compress or decompress anything. + * It is used for test case of specifying customized compressor. + */ +class CustomizeCompressor extends Compressor { + override def getName: String = "org.apache.carbondata.integration.spark.testsuite.dataload.CustomizeCompressor" --- End diff -- We no need to maintain the relationship between shortname and class name. user mentions short name in the implemented class and keeps the jar in classpath. Carbondata loads all classes which are implementing compressor class and loads the class and gets the shortname from that class. So user only should give shortname in table properties and we store only shortname in thrift as well. This is just like spark datasource , please go through it once. It would be simple for user if we provide short name. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2715#discussion_r226913303 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala --- @@ -42,6 +44,112 @@ case class Rcd(booleanField: Boolean, shortField: Short, intField: Int, bigintFi dateField: String, charField: String, floatField: Float, stringDictField: String, stringSortField: String, stringLocalDictField: String, longStringField: String) +/** + * This compressor actually will not compress or decompress anything. + * It is used for test case of specifying customized compressor. + */ +class CustomizeCompressor extends Compressor { + override def getName: String = "org.apache.carbondata.integration.spark.testsuite.dataload.CustomizeCompressor" --- End diff -- > "Carbondata loads all classes which are implementing compressor class" How can we figure out this? Spark's implementation still requires the user to specify the whole class name of their customize CompressionCodec. My problem is that we do not know the full class name of the codec when we can only get the short name from the file meta. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2715#discussion_r226913624 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala --- @@ -42,6 +44,112 @@ case class Rcd(booleanField: Boolean, shortField: Short, intField: Int, bigintFi dateField: String, charField: String, floatField: Float, stringDictField: String, stringSortField: String, stringLocalDictField: String, longStringField: String) +/** + * This compressor actually will not compress or decompress anything. + * It is used for test case of specifying customized compressor. + */ +class CustomizeCompressor extends Compressor { + override def getName: String = "org.apache.carbondata.integration.spark.testsuite.dataload.CustomizeCompressor" --- End diff -- > "Carbondata loads all classes which are implementing compressor class" How can we figure out this? Spark's implementation still requires the user to specify the whole class name of their customize CompressionCodec. My problem is that we do not know the full class name of the codec when we can only get the short name from the file meta. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2715#discussion_r227235560 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala --- @@ -42,6 +44,112 @@ case class Rcd(booleanField: Boolean, shortField: Short, intField: Int, bigintFi dateField: String, charField: String, floatField: Float, stringDictField: String, stringSortField: String, stringLocalDictField: String, longStringField: String) +/** + * This compressor actually will not compress or decompress anything. + * It is used for test case of specifying customized compressor. + */ +class CustomizeCompressor extends Compressor { + override def getName: String = "org.apache.carbondata.integration.spark.testsuite.dataload.CustomizeCompressor" --- End diff -- I am not sure, what you are referring, but I am referring spark datasource (fileformat) design. For example, while creating carbon table we just mention `using carbon` but never mention whole implementation class name. It also saves only the shortname `carbon` in its metadata, not the whole class name. It can get the whole class name from implementation classes of `FileFormat`. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2715#discussion_r227241517 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala --- @@ -42,6 +44,112 @@ case class Rcd(booleanField: Boolean, shortField: Short, intField: Int, bigintFi dateField: String, charField: String, floatField: Float, stringDictField: String, stringSortField: String, stringLocalDictField: String, longStringField: String) +/** + * This compressor actually will not compress or decompress anything. + * It is used for test case of specifying customized compressor. + */ +class CustomizeCompressor extends Compressor { + override def getName: String = "org.apache.carbondata.integration.spark.testsuite.dataload.CustomizeCompressor" --- End diff -- I'm refering to the implementation of 'CompressionCodec' in spark. Due to yesterday's github problem, some content of my previous was missing. The corresponding code in spark is here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala#L57-L82 For a customize compressionCodec, the line#75 use the name as the class name. I think in your example, spark file format can handle the short name of 'carbon' because we are using CarbonSession and in this session we have registered the short name of carbon. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2715#discussion_r228054678 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala --- @@ -42,6 +44,112 @@ case class Rcd(booleanField: Boolean, shortField: Short, intField: Int, bigintFi dateField: String, charField: String, floatField: Float, stringDictField: String, stringSortField: String, stringLocalDictField: String, longStringField: String) +/** + * This compressor actually will not compress or decompress anything. + * It is used for test case of specifying customized compressor. + */ +class CustomizeCompressor extends Compressor { + override def getName: String = "org.apache.carbondata.integration.spark.testsuite.dataload.CustomizeCompressor" --- End diff -- @ravipesala Hi, I've communicated with @KanakaKumar and finally figured out the background of your proposal. For carbon spark datasource (fileformat), we can use ‘using carbon’ in creating table instead of specifying the whole class name for SparkCarbonDataSource. So here you want the compressioncodec can be implemented the same way like this -- using the short name instead of whole class name. But there are something underlying that prevent us to do that -- the very thing that I'm concerning is that we cannot get the real class name only from short name. 1. In carbon spark-datasource module, we have a file named 'org.apache.spark.sql.sources.DataSourceRegister' with content 'org.apache.spark.sql.carbondata.execution.datasources.SparkCarbonFileFormat' in the path 'resources/META-INF.services'. Spark may discover this file and load this class, then get the short name and the whole class name for carbon datasource. If our compressioncodec want to be implemented like this, we will have to provide extra property files and carbon will register the codecs after loading and reading this file. 2. There is another way to do this : java provides a way to find all the implementations of a specific interface. We can use this to find all the compressioncodecs and get their shortName as well as whole class name. But the problem may lie in that this procedure may take too much long time, especially we spark now have so many jars in classpath. In conclusion, I think it's OK to accept current implementation by using the whole class name of compressioncodec. And actually spark implement it in this way too. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2715 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2715 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1402/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2715 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1188/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2715 rebased with current master --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2715 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9454/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2715 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1410/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2715 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1196/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2715 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9460/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2715 LGTM --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2715 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2715 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1335/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2715 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9594/ --- |
Free forum by Nabble | Edit this page |