[GitHub] carbondata pull request #2652: [CARBONDATA-2879] Supported Sort Scope for SD...

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2652: [CARBONDATA-2879] Supported Sort Scope for SD...

qiuchenjian-2
GitHub user praveenmeenakshi56 opened a pull request:

    https://github.com/apache/carbondata/pull/2652

    [CARBONDATA-2879] Supported Sort Scope for SDK

    ### What was the problem?
    For SDK, only Local sort was supported.
   
    ### What has been Changed?
    Hard coding of Local_Sort has been removed.
    Batch_Sort and No_Sort have been supported.
    In case of no options specified, default option changed to Local_ Sort
   
     - [ ] Any interfaces changed?
    NA
     - [ ] Any backward compatibility impacted?
    NA
     - [ ] Document update required?
    Yes, will be updated later.
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
    Testing done and UT's added.
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/praveenmeenakshi56/carbondata sdk_sort

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2652.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2652
   
----
commit 0553bf8f912511c8a806106b1d76fd7f17fee984
Author: praveenmeenakshi56 <praveenmeenakshi56@...>
Date:   2018-08-23T10:28:35Z

    Supported Sort Scope for SDK

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2652
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6345/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2652
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6346/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2652
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6347/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2652
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7995/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2652
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6718/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2652#discussion_r213945033
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
    @@ -259,14 +259,23 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
               !option.equalsIgnoreCase("complex_delimiter_level_1") &&
               !option.equalsIgnoreCase("complex_delimiter_level_2") &&
               !option.equalsIgnoreCase("quotechar") &&
    -          !option.equalsIgnoreCase("escapechar")) {
    +          !option.equalsIgnoreCase("escapechar") &&
    +          !option.equalsIgnoreCase("sort_scope")) {
             throw new IllegalArgumentException("Unsupported options. "
                 + "Refer method header or documentation");
           }
         }
     
         // convert it to treeMap as keys need to be case insensitive
         Map<String, String> optionsTreeMap = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
    +    if (options.containsKey("sort_scope")) {
    +      String sortScope = options.get("sort_scope");
    +      if (!((sortScope.equalsIgnoreCase("local_sort")) ||
    --- End diff --
   
    you can use switch case to make it more readable


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2652#discussion_r213945210
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
    @@ -259,14 +259,23 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
               !option.equalsIgnoreCase("complex_delimiter_level_1") &&
               !option.equalsIgnoreCase("complex_delimiter_level_2") &&
               !option.equalsIgnoreCase("quotechar") &&
    -          !option.equalsIgnoreCase("escapechar")) {
    +          !option.equalsIgnoreCase("escapechar") &&
    +          !option.equalsIgnoreCase("sort_scope")) {
             throw new IllegalArgumentException("Unsupported options. "
                 + "Refer method header or documentation");
           }
         }
     
         // convert it to treeMap as keys need to be case insensitive
         Map<String, String> optionsTreeMap = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
    +    if (options.containsKey("sort_scope")) {
    +      String sortScope = options.get("sort_scope");
    +      if (!((sortScope.equalsIgnoreCase("local_sort")) ||
    +          (sortScope.equalsIgnoreCase("batch_sort"))
    +          || (sortScope.equalsIgnoreCase("no_sort")))) {
    +        throw new IllegalArgumentException("Invalid Sort Scope Option");
    --- End diff --
   
    add the sortScope in the message also


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2652#discussion_r213946688
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
    @@ -246,8 +246,8 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
         Objects.requireNonNull(options, "Load options should not be null");
         //validate the options.
         if (options.size() > 9) {
    -      throw new IllegalArgumentException("Supports only nine options now. "
    -          + "Refer method header or documentation");
    +      throw new IllegalArgumentException(
    +          "Supports only ten options now. Refer method header or documentation");
    --- End diff --
   
    I think validating in this way is not good. @KanakaKumar can you have a check?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2652#discussion_r213946805
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala ---
    @@ -2364,6 +2364,83 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll {
         checkAnswer(sql("select * from sdkOutputTable"), Seq(Row(Timestamp.valueOf("1970-01-02 16:00:00"), Row(Timestamp.valueOf("1970-01-02 16:00:00")))))
       }
     
    +  test("test Sort Scope with Local_Sort for SDK") {
    +
    +    cleanTestData()
    +    var options = Map("sort_scope" -> "local_sort").asJava
    +
    +    val fields: Array[Field] = new Array[Field](4)
    +    fields(0) = new Field("stringField", DataTypes.STRING)
    +    fields(1) = new Field("intField", DataTypes.INT)
    +
    +    val builder: CarbonWriterBuilder = CarbonWriter.builder
    +      .outputPath(writerPath).isTransactionalTable(false).withLoadOptions(options)
    --- End diff --
   
    write in multiple lines


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2652#discussion_r213947270
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala ---
    @@ -2364,6 +2364,83 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll {
         checkAnswer(sql("select * from sdkOutputTable"), Seq(Row(Timestamp.valueOf("1970-01-02 16:00:00"), Row(Timestamp.valueOf("1970-01-02 16:00:00")))))
       }
     
    +  test("test Sort Scope with Local_Sort for SDK") {
    +
    +    cleanTestData()
    +    var options = Map("sort_scope" -> "local_sort").asJava
    +
    +    val fields: Array[Field] = new Array[Field](4)
    +    fields(0) = new Field("stringField", DataTypes.STRING)
    +    fields(1) = new Field("intField", DataTypes.INT)
    +
    +    val builder: CarbonWriterBuilder = CarbonWriter.builder
    +      .outputPath(writerPath).isTransactionalTable(false).withLoadOptions(options)
    +
    +    val writer: CarbonWriter = builder.buildWriterForCSVInput(new Schema(fields))
    +    writer.write(Array("carbon","1"));
    +    writer.write(Array("hydrogen","10"));
    +    writer.write(Array("boron","4"));
    +    writer.write(Array("zirconium","5"));
    +    writer.write(Array("iron","8"));
    +    writer.write(Array("manganese","4"));
    +    writer.write(Array("gold","6"));
    +    writer.write(Array("silver","3"));
    +    writer.write(Array("copper","9"));
    +    writer.write(Array("aluminium","9"));
    +    writer.close()
    +
    +    assert(new File(writerPath).exists())
    +
    +    sql("DROP TABLE IF EXISTS sdkTable")
    +    sql(
    +      s"""CREATE EXTERNAL TABLE sdkTable STORED BY 'carbondata' LOCATION
    +         |'$writerPath' """.stripMargin)
    +
    +    checkAnswer(sql("select * from sdkTable"), Seq(
    +      Row("aluminium",9),Row("boron",4),Row("carbon",1),Row("copper",9),Row("gold",6),Row("hydrogen",10),Row("iron",8),Row("manganese",4),Row("silver",3),Row("zirconium",5)
    +    ))
    +    sql("DROP TABLE sdkTable")
    +    cleanTestData()
    +  }
    +
    +  test("test Sort Scope with No_Sort for SDK") {
    --- End diff --
   
    Since sort_scope is a option for loading, can you add a test case to load a batch using local_sort and another batch data in no_sort, and read to validate the result?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2652#discussion_r213970097
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
    @@ -246,8 +246,8 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
         Objects.requireNonNull(options, "Load options should not be null");
         //validate the options.
         if (options.size() > 9) {
    -      throw new IllegalArgumentException("Supports only nine options now. "
    -          + "Refer method header or documentation");
    +      throw new IllegalArgumentException(
    +          "Supports only ten options now. Refer method header or documentation");
    --- End diff --
   
    Yes this number based validation needs to be removed. Only validation of given names already present in next for loop. If the original intention is to validate duplicate option entries, can have different logic and validation message.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2652#discussion_r214844517
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
    @@ -246,8 +246,8 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
         Objects.requireNonNull(options, "Load options should not be null");
         //validate the options.
         if (options.size() > 9) {
    --- End diff --
   
    This also need to increase to 10?
    Better we remove this


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2652#discussion_r214845435
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
    @@ -259,14 +259,23 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
               !option.equalsIgnoreCase("complex_delimiter_level_1") &&
               !option.equalsIgnoreCase("complex_delimiter_level_2") &&
               !option.equalsIgnoreCase("quotechar") &&
    -          !option.equalsIgnoreCase("escapechar")) {
    +          !option.equalsIgnoreCase("escapechar") &&
    --- End diff --
   
    Also document and method header needs update


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2652
 
    Handled in #2692, please clcose this PR


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2652: [WIP] Supported Sort Scope for SDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user praveenmeenakshi56 closed the pull request at:

    https://github.com/apache/carbondata/pull/2652


---