GitHub user chenerlu opened a pull request:
https://github.com/apache/carbondata/pull/1321 [CARBONDATA-1438] Unify the sort column and sort scope in create table command Background: In order to improve the ease of usage for users, unify the sort column and sort scope in create table command. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata pr-1438 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1321.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1321 ---- commit c20caf973a70e6c6dc94d3cb26bf22404646e8eb Author: chenerlu <[hidden email]> Date: 2017-09-04T12:54:55Z Unify the sort column and sort scope in create table command ---- --- |
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1321#discussion_r136828909 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestGlobalSortDataLoad.scala --- @@ -109,23 +118,22 @@ class TestGlobalSortDataLoad extends QueryTest with BeforeAndAfterEach with Befo } // ----------------------------------- Configuration Validity ----------------------------------- - test("Don't support GLOBAL_SORT on partitioned table") { + ignore("Don't support GLOBAL_SORT on partitioned table") { --- End diff -- why ignore this testcase? --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1321#discussion_r136829721 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -639,6 +639,23 @@ case class LoadTable( val carbonProperty: CarbonProperties = CarbonProperties.getInstance() carbonProperty.addProperty("zookeeper.enable.lock", "false") val optionsFinal = getFinalOptions(carbonProperty) + val tableProperties = relation.tableMeta.carbonTable.getTableInfo + .getFactTable.getTableProperties + + optionsFinal.put("sort_scope", tableProperties.getOrDefault("sort_scope", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, + CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT)))) + + optionsFinal.put("batch_sort_size_inmb", tableProperties.getOrDefault("batch_sort_size_inmb", --- End diff -- This is needed only if using batch sort --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1321#discussion_r136829772 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -639,6 +639,23 @@ case class LoadTable( val carbonProperty: CarbonProperties = CarbonProperties.getInstance() carbonProperty.addProperty("zookeeper.enable.lock", "false") val optionsFinal = getFinalOptions(carbonProperty) + val tableProperties = relation.tableMeta.carbonTable.getTableInfo + .getFactTable.getTableProperties + + optionsFinal.put("sort_scope", tableProperties.getOrDefault("sort_scope", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, + CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT)))) + + optionsFinal.put("batch_sort_size_inmb", tableProperties.getOrDefault("batch_sort_size_inmb", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_BATCH_SORT_SIZE_INMB, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB, + CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB_DEFAULT)))) + + optionsFinal.put("global_sort_partitions", tableProperties.getOrDefault("global_sort_partitions", --- End diff -- This is needed only if using global sort --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1321#discussion_r136829860 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -556,21 +556,21 @@ case class LoadTable( carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_DATEFORMAT, CarbonLoadOptionConstants.CARBON_OPTIONS_DATEFORMAT_DEFAULT))) - optionsFinal.put("global_sort_partitions", options.getOrElse("global_sort_partitions", - carbonProperty - .getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_GLOBAL_SORT_PARTITIONS, null))) +// optionsFinal.put("global_sort_partitions", options.getOrElse("global_sort_partitions", --- End diff -- delete unused code --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1321#discussion_r136830112 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestGlobalSortDataLoad.scala --- @@ -318,12 +329,12 @@ class TestGlobalSortDataLoad extends QueryTest with BeforeAndAfterEach with Befo | charField CHAR(5), | floatField FLOAT | ) - | STORED BY 'org.apache.carbondata.format' + | STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('SORT_SCOPE'='GLOBAL_SORT') """.stripMargin) sql( s""" | LOAD DATA LOCAL INPATH '$path' INTO TABLE carbon_globalsort_difftypes - | OPTIONS('SORT_SCOPE'='GLOBAL_SORT', + | OPTIONS( | 'FILEHEADER'='shortField,intField,bigintField,doubleField,stringField,timestampField,decimalField,dateField,charField,floatField') """.stripMargin) --- End diff -- Add a new testcase to test using SORT_SCOPE in LOAD DATA statement, it should fail --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1321 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/517/ --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on the issue:
https://github.com/apache/carbondata/pull/1321 Unification for simplicity is ok. Ex: SORT_SCOPE provided in create table is default SORT_SCOPE for that table. However we have scenarios where SORT_SCOPE needs to be overridden in load command for data load speed/streaming cases. So that compaction will bring back the scope to table level default value. Option 1: We can decide if delete load time SORT_SCOPE and bring it back later when it is fully supported. Option 2: Keep the syntax in data load as optional parameter and currently validate to allow only Valid values. like Data load level BATCH_SORT when table level BATCH_SORT or LOCAL_SORT is valid, as current compaction can support this. currently all other cases dataload level SORT_SCOPE should be same as table level SORT_SCOPE. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/1321 @gvramana I suggest we consider these two requirements separately: 1. Streaming ingestion to carbon table 2. Want to override the sort scope after table is created. For 1, I think we should probably have another table property in the create table to indicate this table is streaming ingestable. And we should have separate API instead of SQL to do the streaming ingest. Anyway, I think it is better to design after carbon 1.2 in streaming ingest branch. For 2, I feel we should not allow this, at least for carbon 1.2, because this will open up many complexity. Not just loading, but also compaction a lot of validation and configuration to make it work for different level. We should allow this only if all combination scenario has been tested. In one word, we should remove the SORT_SCOPE option in LOAD DATA statement for carbondata 1.2. If there are requirement for it in future, we always can add it later. --- |
In reply to this post by qiuchenjian-2
Github user chenerlu commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1321#discussion_r137441815 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -639,6 +639,23 @@ case class LoadTable( val carbonProperty: CarbonProperties = CarbonProperties.getInstance() carbonProperty.addProperty("zookeeper.enable.lock", "false") val optionsFinal = getFinalOptions(carbonProperty) + val tableProperties = relation.tableMeta.carbonTable.getTableInfo + .getFactTable.getTableProperties + + optionsFinal.put("sort_scope", tableProperties.getOrDefault("sort_scope", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, + CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT)))) + + optionsFinal.put("batch_sort_size_inmb", tableProperties.getOrDefault("batch_sort_size_inmb", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_BATCH_SORT_SIZE_INMB, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB, + CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB_DEFAULT)))) + + optionsFinal.put("global_sort_partitions", tableProperties.getOrDefault("global_sort_partitions", --- End diff -- Same as batch sort size I think. --- |
In reply to this post by qiuchenjian-2
Github user chenerlu commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1321#discussion_r137441759 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -639,6 +639,23 @@ case class LoadTable( val carbonProperty: CarbonProperties = CarbonProperties.getInstance() carbonProperty.addProperty("zookeeper.enable.lock", "false") val optionsFinal = getFinalOptions(carbonProperty) + val tableProperties = relation.tableMeta.carbonTable.getTableInfo + .getFactTable.getTableProperties + + optionsFinal.put("sort_scope", tableProperties.getOrDefault("sort_scope", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, + CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT)))) + + optionsFinal.put("batch_sort_size_inmb", tableProperties.getOrDefault("batch_sort_size_inmb", --- End diff -- Yes, this is only needed for batch sort, but I think if users specify this parameter in global sort, it is better to ignore it. --- |
In reply to this post by qiuchenjian-2
Github user chenerlu commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1321#discussion_r137442307 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestGlobalSortDataLoad.scala --- @@ -318,12 +329,12 @@ class TestGlobalSortDataLoad extends QueryTest with BeforeAndAfterEach with Befo | charField CHAR(5), | floatField FLOAT | ) - | STORED BY 'org.apache.carbondata.format' + | STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('SORT_SCOPE'='GLOBAL_SORT') """.stripMargin) sql( s""" | LOAD DATA LOCAL INPATH '$path' INTO TABLE carbon_globalsort_difftypes - | OPTIONS('SORT_SCOPE'='GLOBAL_SORT', + | OPTIONS( | 'FILEHEADER'='shortField,intField,bigintField,doubleField,stringField,timestampField,decimalField,dateField,charField,floatField') """.stripMargin) --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1321 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/664/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1321 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/675/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1321 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/676/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1321 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/678/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1321 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/682/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1321 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/689/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1321 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/12/ --- |
In reply to this post by qiuchenjian-2
Github user chenerlu commented on the issue:
https://github.com/apache/carbondata/pull/1321 please retest it --- |
Free forum by Nabble | Edit this page |