GitHub user dhatchayani opened a pull request:
https://github.com/apache/carbondata/pull/2976 [CARBONDATA-2755][Complex DataType Enhancements] Compaction Complex Types (1) Enabling Compaction with Complex DataTypes. (2) Major Minor compaction will run over complex dataTypes. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done UT Added - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dhatchayani/carbondata CARBONDATA-2755 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2976.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2976 ---- commit e691a1dda5e912828493c83b1c26f9d450fa5a5b Author: sounakr <sounakr@...> Date: 2018-07-17T05:05:32Z [CARBONDATA-2755][Complex DataType Enhancements] Compaction Complex Types. Enabling Compaction of Complex DataTypes. Major Minor compaction will run over complex dataTypes. ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2976 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1643/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2976 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9903/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2976 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1854/ --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/2976 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2976 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1676/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2976 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1888/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2976 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9936/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2976#discussion_r240209202 --- Diff: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonCompactionUtil.java --- @@ -337,6 +342,25 @@ public static void addColumnCardinalityToMap(Map<String, Integer> columnCardinal .toPrimitive(updatedCardinalityList.toArray(new Integer[updatedCardinalityList.size()])); } + private static void fillColumnSchemaListForComplexDims(List<CarbonDimension> carbonDimensionsList, --- End diff -- Can you add comment what is happening in this method? --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2976#discussion_r240210882 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java --- @@ -371,9 +374,25 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa .getFormattedCardinality(segmentProperties.getDimColumnsCardinality(), wrapperColumnSchema); carbonFactDataHandlerModel.setColCardinality(formattedCardinality); //TO-DO Need to handle complex types here . --- End diff -- Remove it --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2976#discussion_r240212283 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java --- @@ -407,6 +426,81 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa return carbonFactDataHandlerModel; } + /** + * This routine takes the Complex Dimension and convert into generic DataType. + * @param complexDimensions + * @param dimensionCount + * @param isNullFormat + *@param isEmptyBadRecords @return + */ + private static Map<Integer, GenericDataType> convertComplexDimensionToGenericDataType( + List<CarbonDimension> complexDimensions, int dimensionCount, String isNullFormat, + boolean isEmptyBadRecords) { + Map<Integer, GenericDataType> complexIndexMap = + new HashMap<Integer, GenericDataType>(complexDimensions.size()); + + for (CarbonDimension carbonDimension : complexDimensions) { + + if (carbonDimension.isComplex()) { + GenericDataType g; --- End diff -- Pls give some proper name --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2976#discussion_r240212600 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java --- @@ -407,6 +426,81 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa return carbonFactDataHandlerModel; } + /** + * This routine takes the Complex Dimension and convert into generic DataType. + * @param complexDimensions + * @param dimensionCount + * @param isNullFormat + *@param isEmptyBadRecords @return + */ + private static Map<Integer, GenericDataType> convertComplexDimensionToGenericDataType( + List<CarbonDimension> complexDimensions, int dimensionCount, String isNullFormat, + boolean isEmptyBadRecords) { + Map<Integer, GenericDataType> complexIndexMap = + new HashMap<Integer, GenericDataType>(complexDimensions.size()); + + for (CarbonDimension carbonDimension : complexDimensions) { + + if (carbonDimension.isComplex()) { + GenericDataType g; + if (carbonDimension.getColumnSchema().getDataType().getName().equalsIgnoreCase("ARRAY")) { --- End diff -- Please check the utility to get the complex type --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2976#discussion_r240213617 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java --- @@ -407,6 +426,81 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa return carbonFactDataHandlerModel; } + /** + * This routine takes the Complex Dimension and convert into generic DataType. + * @param complexDimensions + * @param dimensionCount + * @param isNullFormat + *@param isEmptyBadRecords @return + */ + private static Map<Integer, GenericDataType> convertComplexDimensionToGenericDataType( + List<CarbonDimension> complexDimensions, int dimensionCount, String isNullFormat, + boolean isEmptyBadRecords) { + Map<Integer, GenericDataType> complexIndexMap = + new HashMap<Integer, GenericDataType>(complexDimensions.size()); + + for (CarbonDimension carbonDimension : complexDimensions) { + + if (carbonDimension.isComplex()) { + GenericDataType g; + if (carbonDimension.getColumnSchema().getDataType().getName().equalsIgnoreCase("ARRAY")) { + g = new ArrayDataType(carbonDimension.getColName(), "", carbonDimension.getColumnId()); + } else if (carbonDimension.getColumnSchema().getDataType().getName() + .equalsIgnoreCase("STRUCT")) { + g = new StructDataType(carbonDimension.getColName(), "", carbonDimension.getColumnId()); + } else { + // Add Primitive type. + throw new RuntimeException("Primitive Type should not be coming in first loop"); + } + if (carbonDimension.getNumberOfChild() > 0) { + addChildrenForComplex(carbonDimension.getListOfChildDimensions(), g, isNullFormat, + isEmptyBadRecords); + } + g.setOutputArrayIndex(0); + complexIndexMap.put(dimensionCount++, g); + } + + } + return complexIndexMap; + } + + private static void addChildrenForComplex(List<CarbonDimension> listOfChildDimensions, + GenericDataType genericDataType, String isNullFormat, boolean isEmptyBadRecord) { + for (CarbonDimension carbonDimension : listOfChildDimensions) { + if (carbonDimension.getColumnSchema().getDataType().getName().equalsIgnoreCase("ARRAY")) { + GenericDataType arrayGeneric = new ArrayDataType(carbonDimension.getColName(), + carbonDimension.getColName() + .substring(0, carbonDimension.getColName().lastIndexOf(".")), + carbonDimension.getColumnId()); + if (carbonDimension.getNumberOfChild() > 0) { + addChildrenForComplex(carbonDimension.getListOfChildDimensions(), arrayGeneric, + isNullFormat, isEmptyBadRecord); + } + genericDataType.addChildren(arrayGeneric); + } else if (carbonDimension.getColumnSchema().getDataType().getName() + .equalsIgnoreCase("STRUCT")) { + GenericDataType structGeneric = new StructDataType(carbonDimension.getColName(), + carbonDimension.getColName() + .substring(0, carbonDimension.getColName().lastIndexOf(".")), --- End diff -- Pls extract to top and reuse it --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2976#discussion_r240214729 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java --- @@ -407,6 +426,81 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa return carbonFactDataHandlerModel; } + /** + * This routine takes the Complex Dimension and convert into generic DataType. + * @param complexDimensions + * @param dimensionCount + * @param isNullFormat + *@param isEmptyBadRecords @return + */ + private static Map<Integer, GenericDataType> convertComplexDimensionToGenericDataType( + List<CarbonDimension> complexDimensions, int dimensionCount, String isNullFormat, + boolean isEmptyBadRecords) { + Map<Integer, GenericDataType> complexIndexMap = + new HashMap<Integer, GenericDataType>(complexDimensions.size()); + + for (CarbonDimension carbonDimension : complexDimensions) { + + if (carbonDimension.isComplex()) { + GenericDataType g; + if (carbonDimension.getColumnSchema().getDataType().getName().equalsIgnoreCase("ARRAY")) { + g = new ArrayDataType(carbonDimension.getColName(), "", carbonDimension.getColumnId()); + } else if (carbonDimension.getColumnSchema().getDataType().getName() + .equalsIgnoreCase("STRUCT")) { + g = new StructDataType(carbonDimension.getColName(), "", carbonDimension.getColumnId()); + } else { + // Add Primitive type. + throw new RuntimeException("Primitive Type should not be coming in first loop"); + } + if (carbonDimension.getNumberOfChild() > 0) { + addChildrenForComplex(carbonDimension.getListOfChildDimensions(), g, isNullFormat, + isEmptyBadRecords); + } + g.setOutputArrayIndex(0); + complexIndexMap.put(dimensionCount++, g); + } + + } + return complexIndexMap; + } + + private static void addChildrenForComplex(List<CarbonDimension> listOfChildDimensions, + GenericDataType genericDataType, String isNullFormat, boolean isEmptyBadRecord) { + for (CarbonDimension carbonDimension : listOfChildDimensions) { + if (carbonDimension.getColumnSchema().getDataType().getName().equalsIgnoreCase("ARRAY")) { + GenericDataType arrayGeneric = new ArrayDataType(carbonDimension.getColName(), + carbonDimension.getColName() + .substring(0, carbonDimension.getColName().lastIndexOf(".")), + carbonDimension.getColumnId()); + if (carbonDimension.getNumberOfChild() > 0) { + addChildrenForComplex(carbonDimension.getListOfChildDimensions(), arrayGeneric, + isNullFormat, isEmptyBadRecord); + } + genericDataType.addChildren(arrayGeneric); + } else if (carbonDimension.getColumnSchema().getDataType().getName() + .equalsIgnoreCase("STRUCT")) { + GenericDataType structGeneric = new StructDataType(carbonDimension.getColName(), + carbonDimension.getColName() + .substring(0, carbonDimension.getColName().lastIndexOf(".")), + carbonDimension.getColumnId()); + if (carbonDimension.getNumberOfChild() > 0) { + addChildrenForComplex(carbonDimension.getListOfChildDimensions(), structGeneric, + isNullFormat, isEmptyBadRecord); + } + genericDataType.addChildren(structGeneric); + } else { + // Primitive Data Type + genericDataType.addChildren( + new PrimitiveDataType(carbonDimension.getColumnSchema().getColumnName(), + carbonDimension.getDataType(), carbonDimension.getColName() + .substring(0, carbonDimension.getColName().lastIndexOf(".")), + carbonDimension.getColumnId(), + carbonDimension.getColumnSchema().hasEncoding(Encoding.DICTIONARY), isNullFormat, + isEmptyBadRecord)); --- End diff -- Please check and remove `isEmptyBadRecord` from it if not used --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2976#discussion_r240215007 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java --- @@ -371,9 +374,25 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa .getFormattedCardinality(segmentProperties.getDimColumnsCardinality(), wrapperColumnSchema); carbonFactDataHandlerModel.setColCardinality(formattedCardinality); //TO-DO Need to handle complex types here . - Map<Integer, GenericDataType> complexIndexMap = - new HashMap<Integer, GenericDataType>(segmentProperties.getComplexDimensions().size()); - carbonFactDataHandlerModel.setComplexIndexMap(complexIndexMap); + + int simpleDimensionCount = -1; + if (segmentProperties.getDimensions().size() == 0) { + simpleDimensionCount = 0; + } else { + simpleDimensionCount = segmentProperties.getDimensions().size() - segmentProperties + .getNumberOfNoDictionaryDimension() - segmentProperties.getComplexDimensions().size(); + } --- End diff -- Please move down this code to `convertComplexDimensionToGenericDataType` --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2976 This PR supports compaction only for STRUCT and ARRAY. Please raise another jira and PR to support MAP type as well. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2976 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1685/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2976 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9945/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2976 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1897/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2976 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1693/ --- |
Free forum by Nabble | Edit this page |