Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] jackylk commented on a change in pull request #3507: [CARBONDATA-3617] loadDataUsingGlobalSort should based on SortColumns…

Classic

List

Threaded

1 message

GitBox

[GitHub] [carbondata] jackylk commented on a change in pull request #3507: [CARBONDATA-3617] loadDataUsingGlobalSort should based on SortColumns…

jackylk commented on a change in pull request #3507: [CARBONDATA-3617] loadDataUsingGlobalSort should based on SortColumns…
URL: https://github.com/apache/carbondata/pull/3507#discussion_r356967946

##########
File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala
##########
@@ -121,9 +121,11 @@ object DataLoadProcessBuilderOnSpark {
CarbonProperties.getInstance().getGlobalSortRddStorageLevel()))
}

+ val sortColumnIndex = configuration.getSortColumnRangeInfo.getSortColumnIndex
+
import scala.reflect.classTag
val sortRDD = convertRDD
- .sortBy(_.getData, numPartitions = numPartitions)(RowOrdering, classTag[Array[AnyRef]])
+ .sortBy(_.getKey(sortColumnIndex), numPartitions = numPartitions)(RowOrdering, classTag[Array[AnyRef]])

Review comment:
Add a comment at line 124 to describe this optimization. It is to reduce the data read for sorting, right?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services