KanakaKumar commented on a change in pull request #3183: [CARBONDATA-3349] Show sort_columns for each segment
URL:
https://github.com/apache/carbondata/pull/3183#discussion_r281179465
##########
File path: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
##########
@@ -137,14 +159,65 @@ object CarbonStore {
mergedTo,
load.getFileFormat.toString,
Strings.formatSize(dataSize.toFloat),
- Strings.formatSize(indexSize.toFloat))
+ Strings.formatSize(indexSize.toFloat),
+ isSorted,
+ sortColumns)
}
}.toSeq
} else {
Seq.empty
}
}
+ private def getSortColumnsOfSegment(
+ load: LoadMetadataDetails,
+ readCommitScope: ReadCommittedScope,
+ tableDataMap: TableDataMap,
+ hadoopConf: Configuration
+ ): (String, String) = {
+ // isSorted has 3 options: true, false, ""(for legacy store, before version 1.5.1)
+ var isSorted = ""
+ // when isSorted is true, need show sort_columns
+ var sortColumns = ""
+ if (load.getFileFormat == FileFormat.ROW_V1) {
+ isSorted = "false"
+ } else if (tableDataMap != null && load.getVisibility.equalsIgnoreCase("true")) {
+ val indexHeader = SegmentIndexFileStore
+ .getIndexHeaderOfSegment(load,
Review comment:
Say if customer has few thousands of segments, reading header files of all these segments will take huge lot of time. I think some customers use show segments cmd and set segments API for performance mgmt. These cases may impact.
Can think of alternative options
1) Launch a job to read header and get the data
2) Enhance segment status to hold the is sorted & sort column names flag
3) Provide a parameter to show sort_columns only when user wants.
4) Can we use CLI tool to get sort_column details from segment instead of degrading "show segments" ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[hidden email]
With regards,
Apache Git Services