[GitHub] carbondata pull request #2976: [CARBONDATA-2755][Complex DataType Enhancemen...

classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2976: [CARBONDATA-2755][Complex DataType Enhancemen...

qiuchenjian-2
GitHub user dhatchayani opened a pull request:

    https://github.com/apache/carbondata/pull/2976

    [CARBONDATA-2755][Complex DataType Enhancements] Compaction Complex Types

    (1) Enabling Compaction with Complex DataTypes.
    (2) Major Minor compaction will run over complex dataTypes.
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [x] Testing done
            UT Added
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dhatchayani/carbondata CARBONDATA-2755

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2976.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2976
   
----
commit e691a1dda5e912828493c83b1c26f9d450fa5a5b
Author: sounakr <sounakr@...>
Date:   2018-07-17T05:05:32Z

    [CARBONDATA-2755][Complex DataType Enhancements] Compaction Complex Types. Enabling Compaction of Complex DataTypes.
    Major Minor compaction will run over complex dataTypes.

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2976: [CARBONDATA-2755][Complex DataType Enhancements] Com...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2976
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1643/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2976: [CARBONDATA-2755][Complex DataType Enhancements] Com...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2976
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9903/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2976: [CARBONDATA-2755][Complex DataType Enhancements] Com...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2976
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1854/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2976: [WIP][CARBONDATA-2755][Complex DataType Enhancements...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2976
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2976: [CARBONDATA-2755][Complex DataType Enhancements] Com...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2976
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1676/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2976: [CARBONDATA-2755][Complex DataType Enhancements] Com...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2976
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1888/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2976: [CARBONDATA-2755][Complex DataType Enhancements] Com...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2976
 
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9936/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2976: [CARBONDATA-2755][Complex DataType Enhancemen...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2976#discussion_r240209202
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonCompactionUtil.java ---
    @@ -337,6 +342,25 @@ public static void addColumnCardinalityToMap(Map<String, Integer> columnCardinal
             .toPrimitive(updatedCardinalityList.toArray(new Integer[updatedCardinalityList.size()]));
       }
     
    +  private static void fillColumnSchemaListForComplexDims(List<CarbonDimension> carbonDimensionsList,
    --- End diff --
   
    Can you add comment what is happening in this method?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2976: [CARBONDATA-2755][Complex DataType Enhancemen...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2976#discussion_r240210882
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java ---
    @@ -371,9 +374,25 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa
             .getFormattedCardinality(segmentProperties.getDimColumnsCardinality(), wrapperColumnSchema);
         carbonFactDataHandlerModel.setColCardinality(formattedCardinality);
         //TO-DO Need to handle complex types here .
    --- End diff --
   
    Remove it


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2976: [CARBONDATA-2755][Complex DataType Enhancemen...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2976#discussion_r240212283
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java ---
    @@ -407,6 +426,81 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa
         return carbonFactDataHandlerModel;
       }
     
    +  /**
    +   * This routine takes the Complex Dimension and convert into generic DataType.
    +   * @param complexDimensions
    +   * @param dimensionCount
    +   * @param isNullFormat
    +   *@param isEmptyBadRecords @return
    +   */
    +  private static Map<Integer, GenericDataType> convertComplexDimensionToGenericDataType(
    +      List<CarbonDimension> complexDimensions, int dimensionCount, String isNullFormat,
    +      boolean isEmptyBadRecords) {
    +    Map<Integer, GenericDataType> complexIndexMap =
    +        new HashMap<Integer, GenericDataType>(complexDimensions.size());
    +
    +    for (CarbonDimension carbonDimension : complexDimensions) {
    +
    +      if (carbonDimension.isComplex()) {
    +        GenericDataType g;
    --- End diff --
   
    Pls give some proper name


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2976: [CARBONDATA-2755][Complex DataType Enhancemen...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2976#discussion_r240212600
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java ---
    @@ -407,6 +426,81 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa
         return carbonFactDataHandlerModel;
       }
     
    +  /**
    +   * This routine takes the Complex Dimension and convert into generic DataType.
    +   * @param complexDimensions
    +   * @param dimensionCount
    +   * @param isNullFormat
    +   *@param isEmptyBadRecords @return
    +   */
    +  private static Map<Integer, GenericDataType> convertComplexDimensionToGenericDataType(
    +      List<CarbonDimension> complexDimensions, int dimensionCount, String isNullFormat,
    +      boolean isEmptyBadRecords) {
    +    Map<Integer, GenericDataType> complexIndexMap =
    +        new HashMap<Integer, GenericDataType>(complexDimensions.size());
    +
    +    for (CarbonDimension carbonDimension : complexDimensions) {
    +
    +      if (carbonDimension.isComplex()) {
    +        GenericDataType g;
    +        if (carbonDimension.getColumnSchema().getDataType().getName().equalsIgnoreCase("ARRAY")) {
    --- End diff --
   
    Please check the utility to get the complex type


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2976: [CARBONDATA-2755][Complex DataType Enhancemen...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2976#discussion_r240213617
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java ---
    @@ -407,6 +426,81 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa
         return carbonFactDataHandlerModel;
       }
     
    +  /**
    +   * This routine takes the Complex Dimension and convert into generic DataType.
    +   * @param complexDimensions
    +   * @param dimensionCount
    +   * @param isNullFormat
    +   *@param isEmptyBadRecords @return
    +   */
    +  private static Map<Integer, GenericDataType> convertComplexDimensionToGenericDataType(
    +      List<CarbonDimension> complexDimensions, int dimensionCount, String isNullFormat,
    +      boolean isEmptyBadRecords) {
    +    Map<Integer, GenericDataType> complexIndexMap =
    +        new HashMap<Integer, GenericDataType>(complexDimensions.size());
    +
    +    for (CarbonDimension carbonDimension : complexDimensions) {
    +
    +      if (carbonDimension.isComplex()) {
    +        GenericDataType g;
    +        if (carbonDimension.getColumnSchema().getDataType().getName().equalsIgnoreCase("ARRAY")) {
    +          g = new ArrayDataType(carbonDimension.getColName(), "", carbonDimension.getColumnId());
    +        } else if (carbonDimension.getColumnSchema().getDataType().getName()
    +            .equalsIgnoreCase("STRUCT")) {
    +          g = new StructDataType(carbonDimension.getColName(), "", carbonDimension.getColumnId());
    +        } else {
    +          // Add Primitive type.
    +          throw new RuntimeException("Primitive Type should not be coming in first loop");
    +        }
    +        if (carbonDimension.getNumberOfChild() > 0) {
    +          addChildrenForComplex(carbonDimension.getListOfChildDimensions(), g, isNullFormat,
    +              isEmptyBadRecords);
    +        }
    +        g.setOutputArrayIndex(0);
    +        complexIndexMap.put(dimensionCount++, g);
    +      }
    +
    +    }
    +    return complexIndexMap;
    +  }
    +
    +  private static void addChildrenForComplex(List<CarbonDimension> listOfChildDimensions,
    +      GenericDataType genericDataType, String isNullFormat, boolean isEmptyBadRecord) {
    +    for (CarbonDimension carbonDimension : listOfChildDimensions) {
    +      if (carbonDimension.getColumnSchema().getDataType().getName().equalsIgnoreCase("ARRAY")) {
    +        GenericDataType arrayGeneric = new ArrayDataType(carbonDimension.getColName(),
    +            carbonDimension.getColName()
    +                .substring(0, carbonDimension.getColName().lastIndexOf(".")),
    +            carbonDimension.getColumnId());
    +        if (carbonDimension.getNumberOfChild() > 0) {
    +          addChildrenForComplex(carbonDimension.getListOfChildDimensions(), arrayGeneric,
    +              isNullFormat, isEmptyBadRecord);
    +        }
    +        genericDataType.addChildren(arrayGeneric);
    +      } else if (carbonDimension.getColumnSchema().getDataType().getName()
    +          .equalsIgnoreCase("STRUCT")) {
    +        GenericDataType structGeneric = new StructDataType(carbonDimension.getColName(),
    +            carbonDimension.getColName()
    +                .substring(0, carbonDimension.getColName().lastIndexOf(".")),
    --- End diff --
   
    Pls extract to top and reuse it


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2976: [CARBONDATA-2755][Complex DataType Enhancemen...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2976#discussion_r240214729
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java ---
    @@ -407,6 +426,81 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa
         return carbonFactDataHandlerModel;
       }
     
    +  /**
    +   * This routine takes the Complex Dimension and convert into generic DataType.
    +   * @param complexDimensions
    +   * @param dimensionCount
    +   * @param isNullFormat
    +   *@param isEmptyBadRecords @return
    +   */
    +  private static Map<Integer, GenericDataType> convertComplexDimensionToGenericDataType(
    +      List<CarbonDimension> complexDimensions, int dimensionCount, String isNullFormat,
    +      boolean isEmptyBadRecords) {
    +    Map<Integer, GenericDataType> complexIndexMap =
    +        new HashMap<Integer, GenericDataType>(complexDimensions.size());
    +
    +    for (CarbonDimension carbonDimension : complexDimensions) {
    +
    +      if (carbonDimension.isComplex()) {
    +        GenericDataType g;
    +        if (carbonDimension.getColumnSchema().getDataType().getName().equalsIgnoreCase("ARRAY")) {
    +          g = new ArrayDataType(carbonDimension.getColName(), "", carbonDimension.getColumnId());
    +        } else if (carbonDimension.getColumnSchema().getDataType().getName()
    +            .equalsIgnoreCase("STRUCT")) {
    +          g = new StructDataType(carbonDimension.getColName(), "", carbonDimension.getColumnId());
    +        } else {
    +          // Add Primitive type.
    +          throw new RuntimeException("Primitive Type should not be coming in first loop");
    +        }
    +        if (carbonDimension.getNumberOfChild() > 0) {
    +          addChildrenForComplex(carbonDimension.getListOfChildDimensions(), g, isNullFormat,
    +              isEmptyBadRecords);
    +        }
    +        g.setOutputArrayIndex(0);
    +        complexIndexMap.put(dimensionCount++, g);
    +      }
    +
    +    }
    +    return complexIndexMap;
    +  }
    +
    +  private static void addChildrenForComplex(List<CarbonDimension> listOfChildDimensions,
    +      GenericDataType genericDataType, String isNullFormat, boolean isEmptyBadRecord) {
    +    for (CarbonDimension carbonDimension : listOfChildDimensions) {
    +      if (carbonDimension.getColumnSchema().getDataType().getName().equalsIgnoreCase("ARRAY")) {
    +        GenericDataType arrayGeneric = new ArrayDataType(carbonDimension.getColName(),
    +            carbonDimension.getColName()
    +                .substring(0, carbonDimension.getColName().lastIndexOf(".")),
    +            carbonDimension.getColumnId());
    +        if (carbonDimension.getNumberOfChild() > 0) {
    +          addChildrenForComplex(carbonDimension.getListOfChildDimensions(), arrayGeneric,
    +              isNullFormat, isEmptyBadRecord);
    +        }
    +        genericDataType.addChildren(arrayGeneric);
    +      } else if (carbonDimension.getColumnSchema().getDataType().getName()
    +          .equalsIgnoreCase("STRUCT")) {
    +        GenericDataType structGeneric = new StructDataType(carbonDimension.getColName(),
    +            carbonDimension.getColName()
    +                .substring(0, carbonDimension.getColName().lastIndexOf(".")),
    +            carbonDimension.getColumnId());
    +        if (carbonDimension.getNumberOfChild() > 0) {
    +          addChildrenForComplex(carbonDimension.getListOfChildDimensions(), structGeneric,
    +              isNullFormat, isEmptyBadRecord);
    +        }
    +        genericDataType.addChildren(structGeneric);
    +      } else {
    +        // Primitive Data Type
    +        genericDataType.addChildren(
    +            new PrimitiveDataType(carbonDimension.getColumnSchema().getColumnName(),
    +                carbonDimension.getDataType(), carbonDimension.getColName()
    +                .substring(0, carbonDimension.getColName().lastIndexOf(".")),
    +                carbonDimension.getColumnId(),
    +                carbonDimension.getColumnSchema().hasEncoding(Encoding.DICTIONARY), isNullFormat,
    +                isEmptyBadRecord));
    --- End diff --
   
    Please check and remove `isEmptyBadRecord` from it if not used


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2976: [CARBONDATA-2755][Complex DataType Enhancemen...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2976#discussion_r240215007
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java ---
    @@ -371,9 +374,25 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa
             .getFormattedCardinality(segmentProperties.getDimColumnsCardinality(), wrapperColumnSchema);
         carbonFactDataHandlerModel.setColCardinality(formattedCardinality);
         //TO-DO Need to handle complex types here .
    -    Map<Integer, GenericDataType> complexIndexMap =
    -        new HashMap<Integer, GenericDataType>(segmentProperties.getComplexDimensions().size());
    -    carbonFactDataHandlerModel.setComplexIndexMap(complexIndexMap);
    +
    +    int simpleDimensionCount = -1;
    +    if (segmentProperties.getDimensions().size() == 0) {
    +      simpleDimensionCount = 0;
    +    } else {
    +      simpleDimensionCount = segmentProperties.getDimensions().size() - segmentProperties
    +          .getNumberOfNoDictionaryDimension() - segmentProperties.getComplexDimensions().size();
    +    }
    --- End diff --
   
    Please move down this code to `convertComplexDimensionToGenericDataType`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2976: [CARBONDATA-2755][Complex DataType Enhancements] Com...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2976
 
    This PR supports compaction only for STRUCT and ARRAY. Please raise another jira and PR to support MAP type as well.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2976: [CARBONDATA-2755][Complex DataType Enhancements] Com...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2976
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1685/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2976: [CARBONDATA-2755][Complex DataType Enhancements] Com...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2976
 
    Build Failed  with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9945/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2976: [CARBONDATA-2755][Complex DataType Enhancements] Com...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2976
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1897/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2976: [CARBONDATA-2755][Complex DataType Enhancements] Com...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2976
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1693/



---
12