Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

Classic

List

49 messages Options

Options

123

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

GitHub user ajantha-bhat opened a pull request:

https://github.com/apache/carbondata/pull/2692

[CARBONDATA-2879] support sort scope for sdk

**problem:**: SDK doesn't support batch sort and no sort as we don't have any API support for sort scope

**Solution**: provide sort_scope in existing load_options, so user can decide which sort_scope.
**currently supported batch_sort, local_sort and no_sort.**
global sort is not applicable in sdk scenario. Hence not supported.
Updated the method header and doc also

Be sure to do all of the following checklists to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed? NA

- [ ] Any backward compatibility impacted? NA

- [ ] Document update required? yes, updated.

- [ ] Testing done.
Added UT

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajantha-bhat/carbondata sdk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2692.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2692

----
commit 16d3e68e4b315ea0b36bbb39b332c1a83c1027c1
Author: ajantha-bhat <ajanthabhat@...>
Date: 2018-09-04T09:25:47Z

sort scope support for sdk

----

---

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2692

I think you can go through all the load options and explicitly support or not support them.
Moreover, these options should also be considered for dataframe write.
Better to keep them consistent.

---

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2692

@xuchuanyin : Already gone through all the load options and decided to support 9 options. This sort scope was a create table property. This we were not supporting. Now we are supporting.
For SDK create table property and load options can be combined as all write() is considered for 1 load (1 writer, 1 load).

---

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2692#discussion_r214892918

--- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java ---
@@ -59,7 +59,7 @@
optionsFinal.put("fileheader", Maps.getOrDefault(options, "fileheader", ""));
optionsFinal.put("commentchar", Maps.getOrDefault(options, "commentchar", "#"));
optionsFinal.put("columndict", Maps.getOrDefault(options, "columndict", null));
-
+ optionsFinal.put("sort_scope", Maps.getOrDefault(options, "sort_scope", "local_sort"));
--- End diff --

Do not use literal explicitly, we already have variables for them such as `CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT`

---

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2692#discussion_r214892173

--- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
@@ -265,12 +262,20 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
!option.equalsIgnoreCase("complex_delimiter_level_1") &&
!option.equalsIgnoreCase("complex_delimiter_level_2") &&
!option.equalsIgnoreCase("quotechar") &&
- !option.equalsIgnoreCase("escapechar")) {
- throw new IllegalArgumentException("Unsupported options. "
- + "Refer method header or documentation");
+ !option.equalsIgnoreCase("escapechar") &&
+ !option.equalsIgnoreCase("sort_scope")) {
+ throw new IllegalArgumentException("Unsupported option:" + option
+ + ". Refer method header or documentation");
}
}

+ // validate sort scope
+ String sortScope = options.get("sort_scope");
+ if ((sortScope != null) && (!(sortScope.equalsIgnoreCase("local_sort")) && !(sortScope
--- End diff --

We already have `CarbonUtil.isValidSortOption`, you can use that to do validation.
Besides you can validate the global_sort in addition and give special error message.

---

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2692#discussion_r214890206

--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala ---
@@ -2359,6 +2359,46 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll {
checkAnswer(sql("select * from sdkOutputTable"), Seq(Row(Timestamp.valueOf("1970-01-02 16:00:00"), Row(Timestamp.valueOf("1970-01-02 16:00:00")))))
}

+ test("test Sort Scope with SDK") {
--- End diff --

This file contains too much code... 2.6k lines of code!
I think you can refactor it later.

---

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2692

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8300/

---

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2692

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/230/

---

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2692#discussion_r214945032

--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala ---
@@ -2359,6 +2359,46 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll {
checkAnswer(sql("select * from sdkOutputTable"), Seq(Row(Timestamp.valueOf("1970-01-02 16:00:00"), Row(Timestamp.valueOf("1970-01-02 16:00:00")))))
}

+ test("test Sort Scope with SDK") {
--- End diff --

yes, later will be handled in separate PR.

we can move AVRO test cases to another file and keep only non-avro in this

---

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2692#discussion_r214951956

--- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
@@ -265,12 +262,20 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
!option.equalsIgnoreCase("complex_delimiter_level_1") &&
!option.equalsIgnoreCase("complex_delimiter_level_2") &&
!option.equalsIgnoreCase("quotechar") &&
- !option.equalsIgnoreCase("escapechar")) {
- throw new IllegalArgumentException("Unsupported options. "
- + "Refer method header or documentation");
+ !option.equalsIgnoreCase("escapechar") &&
+ !option.equalsIgnoreCase("sort_scope")) {
+ throw new IllegalArgumentException("Unsupported option:" + option
+ + ". Refer method header or documentation");
}
}

+ // validate sort scope
+ String sortScope = options.get("sort_scope");
+ if ((sortScope != null) && (!(sortScope.equalsIgnoreCase("local_sort")) && !(sortScope
--- End diff --

OK. I use this now.

---

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2692#discussion_r214952432

--- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java ---
@@ -59,7 +59,7 @@
optionsFinal.put("fileheader", Maps.getOrDefault(options, "fileheader", ""));
optionsFinal.put("commentchar", Maps.getOrDefault(options, "commentchar", "#"));
optionsFinal.put("columndict", Maps.getOrDefault(options, "columndict", null));
-
+ optionsFinal.put("sort_scope", Maps.getOrDefault(options, "sort_scope", "local_sort"));
--- End diff --

ok

---

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2692

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8304/

---

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2692

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/234/

---

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2692

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8308/

---

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2692

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/238/

---

[GitHub] carbondata issue #2692: [CARBONDATA-2879] [CARBONDATA-2918] support sort sco...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2692

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/249/

---

[GitHub] carbondata issue #2692: [CARBONDATA-2879] [CARBONDATA-2918] support sort sco...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2692

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8319/

---

[GitHub] carbondata issue #2692: [CARBONDATA-2879] [CARBONDATA-2918] support sort sco...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2692

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8325/

---

[GitHub] carbondata issue #2692: [CARBONDATA-2879] [CARBONDATA-2918] support sort sco...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2692

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/255/

---

[GitHub] carbondata issue #2692: [CARBONDATA-2879] [CARBONDATA-2918] support sort sco...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2692

retest this please

---

123