Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/686/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1632 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2248/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1632 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/694/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1923/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1632 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/704/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1933/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1632 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/721/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1949/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1632#discussion_r156919460 --- Diff: core/src/main/java/org/apache/carbondata/core/util/NonDictionaryUtil.java --- @@ -108,60 +105,21 @@ public static Object getMeasure(int index, Object[] row) { return measures[index]; } - public static byte[] getByteArrayForNoDictionaryCols(Object[] row) { - - return (byte[]) row[WriteStepRowUtil.NO_DICTIONARY_AND_COMPLEX]; + /** + * Method to get the required non-dictionary & complex from 3-parted row + * @param index + * @param row + * @return + */ + public static byte[] getNonDictOrComplex(int index, Object[] row) { --- End diff -- Rename the method to getNoDictOrComplex --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1632#discussion_r156954293 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala --- @@ -121,17 +121,18 @@ object DataLoadProcessBuilderOnSpark { CarbonProperties.getInstance().getGlobalSortRddStorageLevel())) } + val sortStepRowConverter: SortStepRowHandler = new SortStepRowHandler(sortParameters) import scala.reflect.classTag + + // 3. sort val sortRDD = convertRDD - .sortBy(_.getData, numPartitions = numPartitions)(RowOrdering, classTag[Array[AnyRef]]) - .mapPartitionsWithIndex { case (index, rows) => - DataLoadProcessorStepOnSpark.convertTo3Parts(rows, index, modelBroadcast, - sortStepRowCounter) - } + .map(r => DataLoadProcessorStepOnSpark.convertTo3Parts(r, TaskContext.getPartitionId(), + modelBroadcast, sortStepRowConverter, sortStepRowCounter)) + .sortBy(r => r.getData, numPartitions = numPartitions)(RowOrdering, classTag[Array[AnyRef]]) --- End diff -- @xuchuanyin ... This PR is for compressing sort temp files but this code modification is for data load using global sort flow which does not involve creation of sort temp files. Can you please clarify? --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1632#discussion_r157109850 --- Diff: core/src/main/java/org/apache/carbondata/core/util/NonDictionaryUtil.java --- @@ -108,60 +105,21 @@ public static Object getMeasure(int index, Object[] row) { return measures[index]; } - public static byte[] getByteArrayForNoDictionaryCols(Object[] row) { - - return (byte[]) row[WriteStepRowUtil.NO_DICTIONARY_AND_COMPLEX]; + /** + * Method to get the required non-dictionary & complex from 3-parted row + * @param index + * @param row + * @return + */ + public static byte[] getNonDictOrComplex(int index, Object[] row) { --- End diff -- OK~ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1632#discussion_r157112148 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala --- @@ -121,17 +121,18 @@ object DataLoadProcessBuilderOnSpark { CarbonProperties.getInstance().getGlobalSortRddStorageLevel())) } + val sortStepRowConverter: SortStepRowHandler = new SortStepRowHandler(sortParameters) import scala.reflect.classTag + + // 3. sort val sortRDD = convertRDD - .sortBy(_.getData, numPartitions = numPartitions)(RowOrdering, classTag[Array[AnyRef]]) - .mapPartitionsWithIndex { case (index, rows) => - DataLoadProcessorStepOnSpark.convertTo3Parts(rows, index, modelBroadcast, - sortStepRowCounter) - } + .map(r => DataLoadProcessorStepOnSpark.convertTo3Parts(r, TaskContext.getPartitionId(), + modelBroadcast, sortStepRowConverter, sortStepRowCounter)) + .sortBy(r => r.getData, numPartitions = numPartitions)(RowOrdering, classTag[Array[AnyRef]]) --- End diff -- This change of code is not involved with sort temp file. I changed this because the interface and internal load procedure has been changed. After `convertRDD`, each row is still raw-row; In the sort phrase, rows will be converted to 3-parts; In the write phrase, rows will be encoded and written. In the previous implementation, Carbondata sort on these raw-rows and then convert each row to 3-parts in batch. In the current implementation, Carbondata firstly convert each row to 3-parts, and sort on these rows. While converting raw-row to 3-parts-row, the interface (DataLoadProcessorStepOnSpark.convertTo3Parts) has changed: previously deal with batch, currently deal with one row. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1632 @manishgupta88 review comments are resolved --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1984/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/754/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1632 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2304/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1632 @manishgupta88 @jackylk Hi, how do you think about this PR? I raised a discussion about it and prefer to another method. Please refer to this: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Compression-for-sort-temp-files-in-Carbomdata-td31747.html OR refer to this: https://issues.apache.org/jira/browse/CARBONDATA-1839 --- |
Free forum by Nabble | Edit this page |