GitHub user manishnalla1994 opened a pull request:
https://github.com/apache/carbondata/pull/3022 [CARBONDATA-3196] Fixed Compaction for Complex types with Dictionary Include Problem: Compaction Failing for Complex datatypes with Dictionary Include as KeyGenenrator was not being set in model for Dictionary Include Complex Columns and dictionary include complex columns were not handled for finding cardinality. Solution: Handled both these issues by setting KeyGenerator and storing cardinality of Complex dictionary include columns. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata ComplexCompactionIssue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3022.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3022 ---- commit 874bc3b894c08b76f770722de543615edb45df19 Author: manishnalla1994 <manish.nalla1994@...> Date: 2018-12-24T12:07:36Z Fixed Compaction for Complex types with Dictionary Include ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3022 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1929/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3022 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1930/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3022 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10183/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3022 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2140/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3022 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1952/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3022#discussion_r244087077 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -699,4 +701,33 @@ public static boolean isRawDataRequired(CarbonDataLoadConfiguration configuratio return iterators; } + public static int[] calcDimensionLengths(int numberOfSortColumns, int[] complexCardinality) { + if (!(numberOfSortColumns > 0)) { + for (int i = 0; i < complexCardinality.length; i++) { + if (complexCardinality[i] != 0) { + complexCardinality[i] = Integer.MAX_VALUE; + } + } + } + List<Integer> dimsLenList = new ArrayList<Integer>(); + for (int eachDimLen : complexCardinality) { + if (eachDimLen != 0) dimsLenList.add(eachDimLen); + } + int[] dimLens = new int[dimsLenList.size()]; + for (int i = 0; i < dimsLenList.size(); i++) { + dimLens[i] = dimsLenList.get(i); + } + return dimLens; + } + + public static KeyGenerator[] createKeyGeneratorForComplexDimension(int numberOfSortColumns, --- End diff -- Add a method comment to explain the logic and method usage --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3022#discussion_r244086909 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -699,4 +701,33 @@ public static boolean isRawDataRequired(CarbonDataLoadConfiguration configuratio return iterators; } + public static int[] calcDimensionLengths(int numberOfSortColumns, int[] complexCardinality) { + if (!(numberOfSortColumns > 0)) { + for (int i = 0; i < complexCardinality.length; i++) { + if (complexCardinality[i] != 0) { + complexCardinality[i] = Integer.MAX_VALUE; + } + } + } + List<Integer> dimsLenList = new ArrayList<Integer>(); + for (int eachDimLen : complexCardinality) { + if (eachDimLen != 0) dimsLenList.add(eachDimLen); + } + int[] dimLens = new int[dimsLenList.size()]; --- End diff -- Conversion from arrayList to array can be done directly --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3022#discussion_r244086878 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -699,4 +701,33 @@ public static boolean isRawDataRequired(CarbonDataLoadConfiguration configuratio return iterators; } + public static int[] calcDimensionLengths(int numberOfSortColumns, int[] complexCardinality) { + if (!(numberOfSortColumns > 0)) { --- End diff -- 1. Rewrite the condition `if (!(numberOfSortColumns > 0))` as `if (numberOfSortColumns == 0)` 2. The functionality of this method is not clear. Add a comment to explain the logic explanation and use of this method --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3022#discussion_r244085418 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestCompactionComplexType.scala --- @@ -1068,4 +1068,40 @@ class TestCompactionComplexType extends QueryTest with BeforeAndAfterAll { sql("Drop table if exists adaptive") } + test("Test major compaction for struct of array type") { + sql("DROP TABLE IF EXISTS carbon") --- End diff -- 1. Include dropping of table in afterAll also 2. Give the test case description as below `Test major compaction with dictionary include for struct of array type` --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3022#discussion_r244086110 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java --- @@ -356,9 +359,20 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa .getColumnSchemaList(carbonTable.getDimensionByTableName(tableName), carbonTable.getMeasureByTableName(tableName)); carbonFactDataHandlerModel.setWrapperColumnSchema(wrapperColumnSchema); - // get the cardinality for all all the columns including no dictionary columns - int[] formattedCardinality = CarbonUtil - .getFormattedCardinality(segmentProperties.getDimColumnsCardinality(), wrapperColumnSchema); + // get the cardinality for all all the columns including no + // dictionary columns and complex columns + int[] dimAndComplexColumnCardinality = + new int[segmentProperties.getDimColumnsCardinality().length + segmentProperties + .getComplexDimColumnCardinality().length]; + for (int i = 0; i < segmentProperties.getDimColumnsCardinality().length; i++) { + dimAndComplexColumnCardinality[i] = segmentProperties.getDimColumnsCardinality()[i]; --- End diff -- Check for a resturcture drop column case when few loads are done and then dictionary column is dropped and compaction is triggered. In that case segmentProperties will contain cardinality of dropped column also check finally what is the schema and cardinality written during compaction --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3022 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10205/ --- |
In reply to this post by qiuchenjian-2
Github user manishnalla1994 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3022#discussion_r244118775 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java --- @@ -356,9 +359,20 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa .getColumnSchemaList(carbonTable.getDimensionByTableName(tableName), carbonTable.getMeasureByTableName(tableName)); carbonFactDataHandlerModel.setWrapperColumnSchema(wrapperColumnSchema); - // get the cardinality for all all the columns including no dictionary columns - int[] formattedCardinality = CarbonUtil - .getFormattedCardinality(segmentProperties.getDimColumnsCardinality(), wrapperColumnSchema); + // get the cardinality for all all the columns including no + // dictionary columns and complex columns + int[] dimAndComplexColumnCardinality = + new int[segmentProperties.getDimColumnsCardinality().length + segmentProperties + .getComplexDimColumnCardinality().length]; + for (int i = 0; i < segmentProperties.getDimColumnsCardinality().length; i++) { + dimAndComplexColumnCardinality[i] = segmentProperties.getDimColumnsCardinality()[i]; --- End diff -- The restructure case is not handled for complex types compaction. I have raised the JIRA issue and I will handle it. Please find the JIRA link : 'https://issues.apache.org/jira/browse/CARBONDATA-3203' . --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3022 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2227/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3022 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2257/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3022 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2051/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3022 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10303/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3022 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2062/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3022 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2266/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3022 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10315/ --- |
Free forum by Nabble | Edit this page |