[GitHub] [carbondata] jackylk commented on a change in pull request #3507: [CARBONDATA-3617] loadDataUsingGlobalSort should based on SortColumns…

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3507: [CARBONDATA-3617] loadDataUsingGlobalSort should based on SortColumns…

GitBox
jackylk commented on a change in pull request #3507: [CARBONDATA-3617] loadDataUsingGlobalSort should based on SortColumns…
URL: https://github.com/apache/carbondata/pull/3507#discussion_r356967946
 
 

 ##########
 File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala
 ##########
 @@ -121,9 +121,11 @@ object DataLoadProcessBuilderOnSpark {
         CarbonProperties.getInstance().getGlobalSortRddStorageLevel()))
     }
 
+    val sortColumnIndex = configuration.getSortColumnRangeInfo.getSortColumnIndex
+
     import scala.reflect.classTag
     val sortRDD = convertRDD
-      .sortBy(_.getData, numPartitions = numPartitions)(RowOrdering, classTag[Array[AnyRef]])
+      .sortBy(_.getKey(sortColumnIndex), numPartitions = numPartitions)(RowOrdering, classTag[Array[AnyRef]])
 
 Review comment:
   Add a comment at line 124 to describe this optimization. It is to reduce the data read for sorting, right?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services