[GitHub] [carbondata] KanakaKumar commented on a change in pull request #3262: [CARBONDATA-3415] Merge index is not working for partition table. Merge index for partition table is taking significantly longer time than normal table.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] KanakaKumar commented on a change in pull request #3262: [CARBONDATA-3415] Merge index is not working for partition table. Merge index for partition table is taking significantly longer time than normal table.

GitBox
KanakaKumar commented on a change in pull request #3262: [CARBONDATA-3415] Merge index is not working for partition table. Merge index for partition table is taking significantly longer time than normal table.
URL: https://github.com/apache/carbondata/pull/3262#discussion_r291467196
 
 

 ##########
 File path: integration/spark-common/src/main/scala/org/apache/spark/rdd/CarbonMergeFilesRDD.scala
 ##########
 @@ -145,7 +169,7 @@ class CarbonMergeFilesRDD(
       if (isHivePartitionedTable) {
         CarbonLoaderUtil
           .mergeIndexFilesInPartitionedSegment(carbonTable, split.segmentId,
-            segmentFileNameToSegmentIdMap.get(split.segmentId))
+            segmentFileNameToSegmentIdMap.get(split.segmentId), split.partitionPath)
 
 Review comment:
   Each partition based task may write & overwrite the same segment file with only its partition path.  Instead we can collect the merge index file from each partition to driver and then write the segment file ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services