[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
GitHub user ajantha-bhat opened a pull request:

    https://github.com/apache/carbondata/pull/2692

    [CARBONDATA-2879] support sort scope for sdk

    **problem:**: SDK doesn't support batch sort and no sort as we don't have any API support for sort scope
   
    **Solution**: provide sort_scope in existing load_options, so user can decide which sort_scope.
    **currently supported batch_sort, local_sort and no_sort.**
    global sort is not applicable in sdk scenario. Hence not supported.
    Updated the method header and doc also
   
    Be sure to do all of the following checklists to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed? NA
     
     - [ ] Any backward compatibility impacted? NA
     
     - [ ] Document update required? yes, updated.
   
     - [ ] Testing done.
    Added UT
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajantha-bhat/carbondata sdk

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2692.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2692
   
----
commit 16d3e68e4b315ea0b36bbb39b332c1a83c1027c1
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-09-04T09:25:47Z

    sort scope support for sdk

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2692
 
    I think you can go through all the load options and explicitly support or not support them.
    Moreover, these options should also be considered for dataframe write.
    Better to keep them consistent.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2692
 
    @xuchuanyin : Already gone through all the load options and decided to support 9 options. This sort scope was a create table property. This we were not supporting. Now we are supporting.
    For SDK create table property and load options can be combined as all write() is considered for 1 load (1 writer, 1 load).


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2692#discussion_r214892918
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java ---
    @@ -59,7 +59,7 @@
         optionsFinal.put("fileheader", Maps.getOrDefault(options, "fileheader", ""));
         optionsFinal.put("commentchar", Maps.getOrDefault(options, "commentchar", "#"));
         optionsFinal.put("columndict", Maps.getOrDefault(options, "columndict", null));
    -
    +    optionsFinal.put("sort_scope", Maps.getOrDefault(options, "sort_scope", "local_sort"));
    --- End diff --
   
    Do not use literal explicitly, we already have variables for them such as `CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2692#discussion_r214892173
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
    @@ -265,12 +262,20 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
               !option.equalsIgnoreCase("complex_delimiter_level_1") &&
               !option.equalsIgnoreCase("complex_delimiter_level_2") &&
               !option.equalsIgnoreCase("quotechar") &&
    -          !option.equalsIgnoreCase("escapechar")) {
    -        throw new IllegalArgumentException("Unsupported options. "
    -            + "Refer method header or documentation");
    +          !option.equalsIgnoreCase("escapechar") &&
    +          !option.equalsIgnoreCase("sort_scope")) {
    +        throw new IllegalArgumentException("Unsupported option:" + option
    +            + ". Refer method header or documentation");
           }
         }
     
    +    // validate sort scope
    +    String sortScope = options.get("sort_scope");
    +    if ((sortScope != null) && (!(sortScope.equalsIgnoreCase("local_sort")) && !(sortScope
    --- End diff --
   
    We already have `CarbonUtil.isValidSortOption`, you can use that to do validation.
    Besides you can validate the global_sort in addition and give special error message.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2692#discussion_r214890206
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala ---
    @@ -2359,6 +2359,46 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll {
         checkAnswer(sql("select * from sdkOutputTable"), Seq(Row(Timestamp.valueOf("1970-01-02 16:00:00"), Row(Timestamp.valueOf("1970-01-02 16:00:00")))))
       }
     
    +  test("test Sort Scope with SDK") {
    --- End diff --
   
    This file contains too much code... 2.6k lines of code!
    I think you can refactor it later.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2692
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8300/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2692
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/230/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2692#discussion_r214945032
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala ---
    @@ -2359,6 +2359,46 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll {
         checkAnswer(sql("select * from sdkOutputTable"), Seq(Row(Timestamp.valueOf("1970-01-02 16:00:00"), Row(Timestamp.valueOf("1970-01-02 16:00:00")))))
       }
     
    +  test("test Sort Scope with SDK") {
    --- End diff --
   
    yes, later will be handled in separate PR.
   
    we can move AVRO test cases to another file and keep only non-avro in this


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2692#discussion_r214951956
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
    @@ -265,12 +262,20 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
               !option.equalsIgnoreCase("complex_delimiter_level_1") &&
               !option.equalsIgnoreCase("complex_delimiter_level_2") &&
               !option.equalsIgnoreCase("quotechar") &&
    -          !option.equalsIgnoreCase("escapechar")) {
    -        throw new IllegalArgumentException("Unsupported options. "
    -            + "Refer method header or documentation");
    +          !option.equalsIgnoreCase("escapechar") &&
    +          !option.equalsIgnoreCase("sort_scope")) {
    +        throw new IllegalArgumentException("Unsupported option:" + option
    +            + ". Refer method header or documentation");
           }
         }
     
    +    // validate sort scope
    +    String sortScope = options.get("sort_scope");
    +    if ((sortScope != null) && (!(sortScope.equalsIgnoreCase("local_sort")) && !(sortScope
    --- End diff --
   
    OK. I use this now.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2692#discussion_r214952432
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java ---
    @@ -59,7 +59,7 @@
         optionsFinal.put("fileheader", Maps.getOrDefault(options, "fileheader", ""));
         optionsFinal.put("commentchar", Maps.getOrDefault(options, "commentchar", "#"));
         optionsFinal.put("columndict", Maps.getOrDefault(options, "columndict", null));
    -
    +    optionsFinal.put("sort_scope", Maps.getOrDefault(options, "sort_scope", "local_sort"));
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2692
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8304/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2692
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/234/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2692
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8308/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2692: [CARBONDATA-2879] support sort scope for sdk

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2692
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/238/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2692: [CARBONDATA-2879] [CARBONDATA-2918] support sort sco...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2692
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/249/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2692: [CARBONDATA-2879] [CARBONDATA-2918] support sort sco...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2692
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8319/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2692: [CARBONDATA-2879] [CARBONDATA-2918] support sort sco...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2692
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8325/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2692: [CARBONDATA-2879] [CARBONDATA-2918] support sort sco...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2692
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/255/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2692: [CARBONDATA-2879] [CARBONDATA-2918] support sort sco...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2692
 
    retest this please


---
123