[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

classic Classic list List threaded Threaded
65 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
GitHub user manishnalla1994 opened a pull request:

    https://github.com/apache/carbondata/pull/2980

    [CARBONDATA-3017] Map DDL Support

    Support Create DDL for Map type.
    This PR is dependant on PR#2979 for the change of delimiters.
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [x] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishnalla1994/carbondata MapDDL5Dec

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2980.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2980
   
----
commit 322b52a64c840317a6905664e8de16327e3635e0
Author: Manish Nalla <manishnalla1994@...>
Date:   2018-10-16T09:48:08Z

    MapDDLSupport

commit 3d119888a80e7d8f9cab59e477984b56af1309f6
Author: manishnalla1994 <manish.nalla1994@...>
Date:   2018-12-07T08:18:31Z

    Added Testcases and Local Dict Support

commit 5fe06801360fc04bab9c1239ea8d007f37bc69d4
Author: manishnalla1994 <manish.nalla1994@...>
Date:   2018-12-07T13:28:54Z

    Test Files for Map

commit 4cc8ba13b234a13b9a3cef541e37f492153e7d1b
Author: manishnalla1994 <manish.nalla1994@...>
Date:   2018-12-07T14:44:12Z

    Changed TestCases and Supported 2 new delimiters

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2980
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1657/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2980
 
    Build Failed  with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9917/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2980
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1869/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2980#discussion_r239991055
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java ---
    @@ -34,8 +37,12 @@
       private int numberOfColumns;
     
       public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration configuration) {
    -    String[] complexDelimiters =
    +    String[] tempComplexDelimiters =
             (String[]) configuration.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS);
    +    Queue<String> complexDelimiters = new LinkedList<>();
    +    for (int i = 0; i < 4; i++) {
    --- End diff --
   
    “i < 4”  the 4 is not clear, it's better to exchage is by constant


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2980#discussion_r240147303
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala ---
    @@ -0,0 +1,452 @@
    +/*
    +
    +    Licensed to the Apache Software Foundation (ASF) under one or more
    +    contributor license agreements. See the NOTICE file distributed with
    +    this work for additional information regarding copyright ownership.
    +    The ASF licenses this file to You under the Apache License, Version 2.0
    +    (the "License"); you may not use this file except in compliance with
    +    the License. You may obtain a copy of the License at
    +    *
    +    http://www.apache.org/licenses/LICENSE-2.0
    +    *
    +    Unless required by applicable law or agreed to in writing, software
    +    distributed under the License is distributed on an "AS IS" BASIS,
    +    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +    See the License for the specific language governing permissions and
    +    limitations under the License.
    +    */
    +package org.apache.carbondata.spark.testsuite.createTable.TestCreateDDLForComplexMapType
    +
    +import java.io.File
    +import java.util
    +
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.spark.sql.{AnalysisException, Row}
    +import org.apache.spark.sql.test.util.QueryTest
    +import org.scalatest.BeforeAndAfterAll
    +
    +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk
    +
    +class TestCreateDDLForComplexMapType extends QueryTest with BeforeAndAfterAll {
    +  private val conf: Configuration = new Configuration(false)
    +
    +  val rootPath = new File(this.getClass.getResource("/").getPath
    +                          + "../../../..").getCanonicalPath
    +
    +  val path = s"$rootPath/examples/spark2/src/main/resources/maptest2.csv"
    +
    +  private def checkForLocalDictionary(dimensionRawColumnChunks: util
    +  .List[DimensionRawColumnChunk]): Boolean = {
    +    var isLocalDictionaryGenerated = false
    +    import scala.collection.JavaConversions._
    +    for (dimensionRawColumnChunk <- dimensionRawColumnChunks) {
    +      if (dimensionRawColumnChunk.getDataChunkV3
    +        .isSetLocal_dictionary) {
    +        isLocalDictionaryGenerated = true
    +      }
    --- End diff --
   
    You can directly use scala filter operation to check for local dictionary


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2980#discussion_r240151744
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java ---
    @@ -119,6 +119,10 @@
             "complex_delimiter_level_2",
             Maps.getOrDefault(options, "complex_delimiter_level_2", ":"));
     
    +    optionsFinal.put(
    +        "complex_delimiter_level_3",
    +        Maps.getOrDefault(options, "complex_delimiter_level_3", "\\\\003"));
    +
    --- End diff --
   
    Remove hardcoding


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2980#discussion_r240150644
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ---
    @@ -188,11 +188,13 @@ case class CarbonLoadDataCommand(
         val carbonLoadModel = new CarbonLoadModel()
         val tableProperties = table.getTableInfo.getFactTable.getTableProperties
         val optionsFinal = LoadOption.fillOptionWithDefaultValue(options.asJava)
    +    // These two delimiters are non configurable and hardcoded for map type
    +    optionsFinal.put("complex_delimiter_level_3", "\003")
    +    optionsFinal.put("complex_delimiter_level_4", "\004")
    --- End diff --
   
    Remove hardcoded values. Create one `ComplexDelimeterEnum` and use the values from there


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2980#discussion_r240152242
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/CarbonParserFactory.java ---
    @@ -51,23 +54,37 @@ public static GenericParser createParser(CarbonColumn carbonColumn, String[] com
        *                           delimiters
        * @return GenericParser
        */
    -  private static GenericParser createParser(CarbonColumn carbonColumn, String[] complexDelimiters,
    +  private static GenericParser createParser(CarbonColumn carbonColumn,
    +      Queue<String> complexDelimiters,
           String nullFormat, int depth) {
    +    if (depth > 2) {
    +      return null;
    --- End diff --
   
    I think we should throw exception with proper error message if depth is more than 2


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2980#discussion_r240151175
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java ---
    @@ -222,8 +222,8 @@ public static CarbonDataLoadConfiguration createConfiguration(CarbonLoadModel lo
         configuration.setSegmentId(loadModel.getSegmentId());
         configuration.setTaskNo(loadModel.getTaskNo());
         configuration.setDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS,
    -        new String[] { loadModel.getComplexDelimiterLevel1(),
    -            loadModel.getComplexDelimiterLevel2() });
    +        new String[] { loadModel.getComplexDelimiterLevel1(), loadModel.getComplexDelimiterLevel2(),
    +            loadModel.getComplexDelimiterLevel3(), loadModel.getComplexDelimiterLevel4() });
    --- End diff --
   
    Instead of adding these many delimeters methods in loadmodel, create a list or array of complex delimeters in loadmodel and then use it here


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2980#discussion_r240145684
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java ---
    @@ -338,12 +338,15 @@ public static CarbonLoadModel getLoadModel(Configuration conf) throws IOExceptio
                 SKIP_EMPTY_LINE,
                 carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SKIP_EMPTY_LINE)));
     
    -    String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":");
    +    String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":" + "," + "\\\\003");
         String[] split = complexDelim.split(",");
         model.setComplexDelimiterLevel1(split[0]);
         if (split.length > 1) {
           model.setComplexDelimiterLevel2(split[1]);
         }
    +    if (split.length > 2) {
    +      model.setComplexDelimiterLevel3(split[2]);
    +    }
    --- End diff --
   
    Modify the above code as below
    `if (split.length > 2) {
          model.setComplexDelimiterLevel2(split[1]);
          model.setComplexDelimiterLevel3(split[2]);
        } else if (split.length > 1) {
          model.setComplexDelimiterLevel2(split[1]);
        }`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2980#discussion_r240126484
 
    --- Diff: examples/spark2/src/main/resources/maptest2.csv ---
    @@ -0,0 +1,2 @@
    +1\002Nalla\0012\002Singh\0011\002Gupta\0014\002Kumar
    +10\002Nallaa\00120\002Sissngh\001100\002Gusspta\00140\002Kumar
    --- End diff --
   
    Move the above newly added files to spark-common-test


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2980#discussion_r240153125
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java ---
    @@ -34,8 +37,12 @@
       private int numberOfColumns;
     
       public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration configuration) {
    -    String[] complexDelimiters =
    +    String[] tempComplexDelimiters =
             (String[]) configuration.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS);
    +    Queue<String> complexDelimiters = new LinkedList<>();
    +    for (int i = 0; i < 4; i++) {
    --- End diff --
   
    Avoid using hardcoded values like `4` in the loops while coding....instead iterate over tempComplexDelimiters length


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2980#discussion_r240126217
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java ---
    @@ -338,12 +338,15 @@ public static CarbonLoadModel getLoadModel(Configuration conf) throws IOExceptio
                 SKIP_EMPTY_LINE,
                 carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SKIP_EMPTY_LINE)));
     
    -    String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":");
    +    String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":" + "," + "\\\\003");
    --- End diff --
   
    Move all the default delimters to constants and use it everywhere


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2980#discussion_r240151368
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/CarbonLoadModel.java ---
    @@ -631,7 +651,7 @@ public void setFactTimeStamp(long factTimeStamp) {
       }
     
       public String[] getDelimiters() {
    -    return new String[] { complexDelimiterLevel1, complexDelimiterLevel2 };
    +    return new String[] { complexDelimiterLevel1, complexDelimiterLevel2, complexDelimiterLevel3 };
    --- End diff --
   
    Modify this class and related changes in other classes as per the above comment


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishnalla1994 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2980#discussion_r240591596
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java ---
    @@ -34,8 +37,12 @@
       private int numberOfColumns;
     
       public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration configuration) {
    -    String[] complexDelimiters =
    +    String[] tempComplexDelimiters =
             (String[]) configuration.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS);
    +    Queue<String> complexDelimiters = new LinkedList<>();
    +    for (int i = 0; i < 4; i++) {
    --- End diff --
   
    Done.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2980
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1705/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2980
 
    Build Failed  with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9965/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2980
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1915/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2980
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1916/



---
1234