Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2806: [CARBONDATA-2998] Refresh column schema for o...

Classic

List

20 messages Options

Options

[GitHub] carbondata pull request #2806: [CARBONDATA-2998] Refresh column schema for o...

GitHub user dhatchayani opened a pull request:

https://github.com/apache/carbondata/pull/2806

[CARBONDATA-2998] Refresh column schema for old store(before V3) for SORT_COLUMNS option

**Problem:**
For old store, store before V3, SORT_COLUMN option is not set in ColumnSchema, but considered as SORT_COLUMNS. So, while refreshing the table it will try to read from the thrift and make it as no sort column in ColumnSchema as it is not set before.

**Solution:**
While refreshing the table, check for the SORT_COLUMN property in the table properties and if nothing is set, then by default take all the dimension columns as SORT_COLUMNS.

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [ ] Testing done

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhatchayani/carbondata CARBONDATA-2998

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2806.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2806

----
commit 69609e3ed42e8e3a675d36479ffd7506f0d832ab
Author: dhatchayani <dhatcha.official@...>
Date: 2018-10-09T12:04:42Z

[CARBONDATA-2998] Refresh column schema for old store(before V3) for SORT_COLUMNS option

----

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/751/

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9017/

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/949/

---

[GitHub] carbondata pull request #2806: [CARBONDATA-2998] Refresh column schema for o...

In reply to this post by qiuchenjian-2

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2806#discussion_r224029719

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/RefreshCarbonTableCommand.scala ---
@@ -118,6 +122,26 @@ case class RefreshCarbonTableCommand(
Seq.empty
}

+ /**
+ * Refresh the sort_column flag in column schema in case of old store. Before V3, sort_column
+ * option is not set but by default all dimension columns should be treated
+ * as sort columns if SORT_COLUMNS property is not defined in tblproperties
+ *
+ * @param tableInfo
+ */
+ def refreshColumnSchema(tableInfo: TableInfo): Unit = {
+ val tableProps: mutable.Map[String, String] = tableInfo.getFactTable.getTableProperties.asScala
+ val sortColumns = tableProps.get(CarbonCommonConstants.SORT_COLUMNS)
+ sortColumns match {
+ case Some(sortColumn) =>
+ // don't do anything
+ case None =>
+ // iterate over all the columns and make all the dimensions as sort columns true
+ tableInfo.getFactTable.getListOfColumns.asScala
+ .filter(column => column.isDimensionColumn).foreach(_.setSortColumn(true))
--- End diff --

no need to look over listOfColumns 2 times. can use collect instead of filter and foreach

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/772/

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/774/

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/971/

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9039/

---

[GitHub] carbondata pull request #2806: [CARBONDATA-2998] Refresh column schema for o...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2806#discussion_r227251258

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/RefreshCarbonTableCommand.scala ---
@@ -118,6 +122,26 @@ case class RefreshCarbonTableCommand(
Seq.empty
}

+ /**
+ * Refresh the sort_column flag in column schema in case of old store. Before V3, sort_column
+ * option is not set but by default all dimension columns should be treated
+ * as sort columns if SORT_COLUMNS property is not defined in tblproperties
+ *
+ * @param tableInfo
+ */
+ def refreshColumnSchema(tableInfo: TableInfo): Unit = {
+ val tableProps: mutable.Map[String, String] = tableInfo.getFactTable.getTableProperties.asScala
+ val sortColumns = tableProps.get(CarbonCommonConstants.SORT_COLUMNS)
+ sortColumns match {
+ case Some(sortColumn) =>
+ // don't do anything
+ case None =>
+ // iterate over all the columns and make all the dimensions as sort columns true
+ tableInfo.getFactTable.getListOfColumns.asScala collect
+ ({case columnSchema if columnSchema.isDimensionColumn => columnSchema.setSortColumn(true)})
--- End diff --

Validate the scenario for complex type as sort column

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2806

This PR holds good till the time the default behavior of load is to consider all dimension columns as sort columns if sort_column property is not specified in the schema. If this default behavior is changed to consider no column as sort column then we will have to recheck this logic and might have to modify.
As of now it looks safe to be merged

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2806

LGTM..can be merged once build is passed

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/965/

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user dhatchayani commented on the issue:

https://github.com/apache/carbondata/pull/2806

retest this please

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1174/

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/973/

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9234/

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1186/

---

[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9240/

---

[GitHub] carbondata pull request #2806: [CARBONDATA-2998] Refresh column schema for o...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2806

---