GitHub user ajantha-bhat opened a pull request:
https://github.com/apache/carbondata/pull/2692 [CARBONDATA-2879] support sort scope for sdk **problem:**: SDK doesn't support batch sort and no sort as we don't have any API support for sort scope **Solution**: provide sort_scope in existing load_options, so user can decide which sort_scope. **currently supported batch_sort, local_sort and no_sort.** global sort is not applicable in sdk scenario. Hence not supported. Updated the method header and doc also Be sure to do all of the following checklists to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? yes, updated. - [ ] Testing done. Added UT - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata sdk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2692.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2692 ---- commit 16d3e68e4b315ea0b36bbb39b332c1a83c1027c1 Author: ajantha-bhat <ajanthabhat@...> Date: 2018-09-04T09:25:47Z sort scope support for sdk ---- --- |
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2692 I think you can go through all the load options and explicitly support or not support them. Moreover, these options should also be considered for dataframe write. Better to keep them consistent. --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2692 @xuchuanyin : Already gone through all the load options and decided to support 9 options. This sort scope was a create table property. This we were not supporting. Now we are supporting. For SDK create table property and load options can be combined as all write() is considered for 1 load (1 writer, 1 load). --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2692#discussion_r214892918 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java --- @@ -59,7 +59,7 @@ optionsFinal.put("fileheader", Maps.getOrDefault(options, "fileheader", "")); optionsFinal.put("commentchar", Maps.getOrDefault(options, "commentchar", "#")); optionsFinal.put("columndict", Maps.getOrDefault(options, "columndict", null)); - + optionsFinal.put("sort_scope", Maps.getOrDefault(options, "sort_scope", "local_sort")); --- End diff -- Do not use literal explicitly, we already have variables for them such as `CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT` --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2692#discussion_r214892173 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -265,12 +262,20 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) { !option.equalsIgnoreCase("complex_delimiter_level_1") && !option.equalsIgnoreCase("complex_delimiter_level_2") && !option.equalsIgnoreCase("quotechar") && - !option.equalsIgnoreCase("escapechar")) { - throw new IllegalArgumentException("Unsupported options. " - + "Refer method header or documentation"); + !option.equalsIgnoreCase("escapechar") && + !option.equalsIgnoreCase("sort_scope")) { + throw new IllegalArgumentException("Unsupported option:" + option + + ". Refer method header or documentation"); } } + // validate sort scope + String sortScope = options.get("sort_scope"); + if ((sortScope != null) && (!(sortScope.equalsIgnoreCase("local_sort")) && !(sortScope --- End diff -- We already have `CarbonUtil.isValidSortOption`, you can use that to do validation. Besides you can validate the global_sort in addition and give special error message. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2692#discussion_r214890206 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala --- @@ -2359,6 +2359,46 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll { checkAnswer(sql("select * from sdkOutputTable"), Seq(Row(Timestamp.valueOf("1970-01-02 16:00:00"), Row(Timestamp.valueOf("1970-01-02 16:00:00"))))) } + test("test Sort Scope with SDK") { --- End diff -- This file contains too much code... 2.6k lines of code! I think you can refactor it later. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2692 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8300/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2692 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/230/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2692#discussion_r214945032 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala --- @@ -2359,6 +2359,46 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll { checkAnswer(sql("select * from sdkOutputTable"), Seq(Row(Timestamp.valueOf("1970-01-02 16:00:00"), Row(Timestamp.valueOf("1970-01-02 16:00:00"))))) } + test("test Sort Scope with SDK") { --- End diff -- yes, later will be handled in separate PR. we can move AVRO test cases to another file and keep only non-avro in this --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2692#discussion_r214951956 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -265,12 +262,20 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) { !option.equalsIgnoreCase("complex_delimiter_level_1") && !option.equalsIgnoreCase("complex_delimiter_level_2") && !option.equalsIgnoreCase("quotechar") && - !option.equalsIgnoreCase("escapechar")) { - throw new IllegalArgumentException("Unsupported options. " - + "Refer method header or documentation"); + !option.equalsIgnoreCase("escapechar") && + !option.equalsIgnoreCase("sort_scope")) { + throw new IllegalArgumentException("Unsupported option:" + option + + ". Refer method header or documentation"); } } + // validate sort scope + String sortScope = options.get("sort_scope"); + if ((sortScope != null) && (!(sortScope.equalsIgnoreCase("local_sort")) && !(sortScope --- End diff -- OK. I use this now. --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2692#discussion_r214952432 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java --- @@ -59,7 +59,7 @@ optionsFinal.put("fileheader", Maps.getOrDefault(options, "fileheader", "")); optionsFinal.put("commentchar", Maps.getOrDefault(options, "commentchar", "#")); optionsFinal.put("columndict", Maps.getOrDefault(options, "columndict", null)); - + optionsFinal.put("sort_scope", Maps.getOrDefault(options, "sort_scope", "local_sort")); --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2692 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8304/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2692 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/234/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2692 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8308/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2692 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/238/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2692 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/249/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2692 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8319/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2692 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8325/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2692 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/255/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2692 retest this please --- |
Free forum by Nabble | Edit this page |