GitHub user dhatchayani opened a pull request:
https://github.com/apache/carbondata/pull/2806 [CARBONDATA-2998] Refresh column schema for old store(before V3) for SORT_COLUMNS option **Problem:** For old store, store before V3, SORT_COLUMN option is not set in ColumnSchema, but considered as SORT_COLUMNS. So, while refreshing the table it will try to read from the thrift and make it as no sort column in ColumnSchema as it is not set before. **Solution:** While refreshing the table, check for the SORT_COLUMN property in the table properties and if nothing is set, then by default take all the dimension columns as SORT_COLUMNS. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dhatchayani/carbondata CARBONDATA-2998 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2806.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2806 ---- commit 69609e3ed42e8e3a675d36479ffd7506f0d832ab Author: dhatchayani <dhatcha.official@...> Date: 2018-10-09T12:04:42Z [CARBONDATA-2998] Refresh column schema for old store(before V3) for SORT_COLUMNS option ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/751/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9017/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/949/ --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2806#discussion_r224029719 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/RefreshCarbonTableCommand.scala --- @@ -118,6 +122,26 @@ case class RefreshCarbonTableCommand( Seq.empty } + /** + * Refresh the sort_column flag in column schema in case of old store. Before V3, sort_column + * option is not set but by default all dimension columns should be treated + * as sort columns if SORT_COLUMNS property is not defined in tblproperties + * + * @param tableInfo + */ + def refreshColumnSchema(tableInfo: TableInfo): Unit = { + val tableProps: mutable.Map[String, String] = tableInfo.getFactTable.getTableProperties.asScala + val sortColumns = tableProps.get(CarbonCommonConstants.SORT_COLUMNS) + sortColumns match { + case Some(sortColumn) => + // don't do anything + case None => + // iterate over all the columns and make all the dimensions as sort columns true + tableInfo.getFactTable.getListOfColumns.asScala + .filter(column => column.isDimensionColumn).foreach(_.setSortColumn(true)) --- End diff -- no need to look over listOfColumns 2 times. can use collect instead of filter and foreach --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/772/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/774/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/971/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9039/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2806#discussion_r227251258 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/RefreshCarbonTableCommand.scala --- @@ -118,6 +122,26 @@ case class RefreshCarbonTableCommand( Seq.empty } + /** + * Refresh the sort_column flag in column schema in case of old store. Before V3, sort_column + * option is not set but by default all dimension columns should be treated + * as sort columns if SORT_COLUMNS property is not defined in tblproperties + * + * @param tableInfo + */ + def refreshColumnSchema(tableInfo: TableInfo): Unit = { + val tableProps: mutable.Map[String, String] = tableInfo.getFactTable.getTableProperties.asScala + val sortColumns = tableProps.get(CarbonCommonConstants.SORT_COLUMNS) + sortColumns match { + case Some(sortColumn) => + // don't do anything + case None => + // iterate over all the columns and make all the dimensions as sort columns true + tableInfo.getFactTable.getListOfColumns.asScala collect + ({case columnSchema if columnSchema.isDimensionColumn => columnSchema.setSortColumn(true)}) --- End diff -- Validate the scenario for complex type as sort column --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2806 This PR holds good till the time the default behavior of load is to consider all dimension columns as sort columns if sort_column property is not specified in the schema. If this default behavior is changed to consider no column as sort column then we will have to recheck this logic and might have to modify. As of now it looks safe to be merged --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2806 LGTM..can be merged once build is passed --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/965/ --- |
In reply to this post by qiuchenjian-2
Github user dhatchayani commented on the issue:
https://github.com/apache/carbondata/pull/2806 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1174/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/973/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9234/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1186/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9240/ --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |