GitHub user ravipesala opened a pull request:
https://github.com/apache/carbondata/pull/3026 [WIP] Added support to compile carbon CDH spark distribution Please use `spark-2.2-cdh` profile to compile cdh. example: ``` mvn -DskipTests -Pspark-2.2-cdh package ``` Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata cdh-support Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3026.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3026 ---- commit 349e901c8f5c8d658859089ab3acaf6377107150 Author: ravipesala <ravi.pesala@...> Date: 2018-12-26T16:27:23Z Added support to compile carbon CDH spark distribution ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3026 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1947/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3026 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10200/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3026 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2156/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3026#discussion_r244065240 --- Diff: integration/spark-datasource/src/main/spark2.1andspark2.2/org/apache/spark/sql/CarbonDictionaryUtil.java --- @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql; + +import java.lang.reflect.Array; +import java.lang.reflect.Field; +import java.lang.reflect.Method; + +import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; + +import org.apache.spark.sql.execution.vectorized.ColumnVector; + +/** + * This class uses the java reflection to create parquet dictionary class as CDH distribution uses + * twitter parquet instead of apache parquet. + */ +public class CarbonDictionaryUtil { --- End diff -- It it better to make it as `ReflectionUtil` And please add InterfaceAudience annotation --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3026 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1948/ --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on the issue:
https://github.com/apache/carbondata/pull/3026 Does carbon not support Cdh using -Pspark-2.2 ? Does CDH change the spark interface ,so that carbon can't run succesfully ? --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3026 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10201/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3026#discussion_r244090720 --- Diff: integration/spark-datasource/src/main/spark2.1andspark2.2/org/apache/spark/sql/CarbonDictionaryUtil.java --- @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql; + +import java.lang.reflect.Array; +import java.lang.reflect.Field; +import java.lang.reflect.Method; + +import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; + +import org.apache.spark.sql.execution.vectorized.ColumnVector; + +/** + * This class uses the java reflection to create parquet dictionary class as CDH distribution uses + * twitter parquet instead of apache parquet. + */ +public class CarbonDictionaryUtil { --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/3026 @qiuchenjian Please check the PR description for why carbon need changes for Spark 2.2 CDH --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on the issue:
https://github.com/apache/carbondata/pull/3026 @ravipesala sorry,i didn't notice it, now i know the purpose --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3026 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2224/ --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3026#discussion_r244137998 --- Diff: integration/spark-datasource/src/main/spark2.1andspark2.2/org/apache/spark/sql/CarbonDictionaryReflectionUtil.java --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql; + +import java.lang.reflect.Array; +import java.lang.reflect.Field; +import java.lang.reflect.Method; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; + +import org.apache.spark.sql.execution.vectorized.ColumnVector; + +/** + * This class uses the java reflection to create parquet dictionary class as CDH distribution uses + * twitter parquet instead of apache parquet. + */ +@InterfaceAudience.Internal +public class CarbonDictionaryReflectionUtil { + + private static final boolean isApacheParquet; + + static { + boolean isApache = true; + try { + createClass("org.apache.parquet.column.Encoding"); + } catch (Exception e) { + isApache = false; + } + isApacheParquet = isApache; + } + + public static Object generateDictionary(CarbonDictionary dictionary) { + Class binary = createClass(getQualifiedName("parquet.io.api.Binary")); + Object binaries = Array.newInstance(binary, dictionary.getDictionarySize()); + try { + for (int i = 0; i < dictionary.getDictionarySize(); i++) { + Object binaryValue = invokeStaticMethod(binary, "fromReusedByteArray", + new Object[] { dictionary.getDictionaryValue(i) }, new Class[] { byte[].class }); + Array.set(binaries, i, binaryValue); + } + ; + Class bytesInputClass = createClass(getQualifiedName("parquet.bytes.BytesInput")); + Object bytesInput = invokeStaticMethod(bytesInputClass, "from", new Object[] { new byte[0] }, + new Class[] { byte[].class }); + + Class dictPageClass = createClass(getQualifiedName("parquet.column.page.DictionaryPage")); + Class encodingClass = createClass(getQualifiedName("parquet.column.Encoding")); + Object plainEncoding = invokeStaticMethod(encodingClass, "valueOf", new Object[] { "PLAIN" }, + new Class[] { String.class }); + + Object dictPageObj = + dictPageClass.getDeclaredConstructor(bytesInputClass, int.class, encodingClass) + .newInstance(bytesInput, 0, plainEncoding); + Class plainDict = createClass(getQualifiedName( + "parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary")); + Object plainDictionary = + plainDict.getDeclaredConstructor(dictPageClass).newInstance(dictPageObj); + Field field = plainDict.getDeclaredField("binaryDictionaryContent"); + field.setAccessible(true); + field.set(plainDictionary, binaries); + return plainDictionary; + } catch (Exception e) { + throw new RuntimeException(e); + } + } + + private static Object invokeStaticMethod(Class className, String methodName, Object[] values, + Class[] classes) throws Exception { + Method method = className.getMethod(methodName, classes); + return method.invoke(null, values); + } + + private static Class createClass(String className) { + try { + return Class.forName(className); --- End diff -- ```suggestion return Class.forName(className, false, CarbonDictionaryReflectionUtil.class.getClassLoader()); ``` I think this method is better, because the class doesn't initialize --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3026 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2050/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3026 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10302/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3026 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2139/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3026 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10393/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3026 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2345/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3026 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2365/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3026 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2151/ --- |
Free forum by Nabble | Edit this page |