[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...

qiuchenjian-2
GitHub user manishnalla1994 opened a pull request:

    https://github.com/apache/carbondata/pull/3022

    [CARBONDATA-3196] Fixed Compaction for Complex types with Dictionary Include

    Problem: Compaction Failing for Complex datatypes with Dictionary Include as KeyGenenrator was not being set in model for Dictionary Include Complex Columns and dictionary include complex columns were not handled for finding cardinality.
   
    Solution: Handled both these issues by setting KeyGenerator and storing cardinality of Complex dictionary include columns.
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [x] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishnalla1994/carbondata ComplexCompactionIssue

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/3022.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3022
   
----
commit 874bc3b894c08b76f770722de543615edb45df19
Author: manishnalla1994 <manish.nalla1994@...>
Date:   2018-12-24T12:07:36Z

    Fixed Compaction for Complex types with Dictionary Include

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3022: [CARBONDATA-3196] Fixed Compaction for Complex types...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3022
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1929/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3022: [CARBONDATA-3196] Fixed Compaction for Complex types...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3022
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1930/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3022: [CARBONDATA-3196] Fixed Compaction for Complex types...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3022
 
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10183/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3022: [CARBONDATA-3196] Fixed Compaction for Complex types...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3022
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2140/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3022: [CARBONDATA-3196] Fixed Compaction for Complex types...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3022
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1952/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3022#discussion_r244087077
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
    @@ -699,4 +701,33 @@ public static boolean isRawDataRequired(CarbonDataLoadConfiguration configuratio
         return iterators;
       }
     
    +  public static int[] calcDimensionLengths(int numberOfSortColumns, int[] complexCardinality) {
    +    if (!(numberOfSortColumns > 0)) {
    +      for (int i = 0; i < complexCardinality.length; i++) {
    +        if (complexCardinality[i] != 0) {
    +          complexCardinality[i] = Integer.MAX_VALUE;
    +        }
    +      }
    +    }
    +    List<Integer> dimsLenList = new ArrayList<Integer>();
    +    for (int eachDimLen : complexCardinality) {
    +      if (eachDimLen != 0) dimsLenList.add(eachDimLen);
    +    }
    +    int[] dimLens = new int[dimsLenList.size()];
    +    for (int i = 0; i < dimsLenList.size(); i++) {
    +      dimLens[i] = dimsLenList.get(i);
    +    }
    +    return dimLens;
    +  }
    +
    +  public static KeyGenerator[] createKeyGeneratorForComplexDimension(int numberOfSortColumns,
    --- End diff --
   
    Add a method comment to explain the logic and method usage


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3022#discussion_r244086909
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
    @@ -699,4 +701,33 @@ public static boolean isRawDataRequired(CarbonDataLoadConfiguration configuratio
         return iterators;
       }
     
    +  public static int[] calcDimensionLengths(int numberOfSortColumns, int[] complexCardinality) {
    +    if (!(numberOfSortColumns > 0)) {
    +      for (int i = 0; i < complexCardinality.length; i++) {
    +        if (complexCardinality[i] != 0) {
    +          complexCardinality[i] = Integer.MAX_VALUE;
    +        }
    +      }
    +    }
    +    List<Integer> dimsLenList = new ArrayList<Integer>();
    +    for (int eachDimLen : complexCardinality) {
    +      if (eachDimLen != 0) dimsLenList.add(eachDimLen);
    +    }
    +    int[] dimLens = new int[dimsLenList.size()];
    --- End diff --
   
    Conversion from arrayList to array can be done directly


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3022#discussion_r244086878
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
    @@ -699,4 +701,33 @@ public static boolean isRawDataRequired(CarbonDataLoadConfiguration configuratio
         return iterators;
       }
     
    +  public static int[] calcDimensionLengths(int numberOfSortColumns, int[] complexCardinality) {
    +    if (!(numberOfSortColumns > 0)) {
    --- End diff --
   
    1. Rewrite the condition `if (!(numberOfSortColumns > 0))` as `if (numberOfSortColumns == 0)`
    2. The functionality of this method is not clear. Add a comment to explain the logic explanation and use of this method


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3022#discussion_r244085418
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestCompactionComplexType.scala ---
    @@ -1068,4 +1068,40 @@ class TestCompactionComplexType extends QueryTest with BeforeAndAfterAll {
         sql("Drop table if exists adaptive")
       }
     
    +  test("Test major compaction for struct of array type") {
    +    sql("DROP TABLE IF EXISTS carbon")
    --- End diff --
   
    1. Include dropping of table in afterAll also
    2. Give the test case description as below
    `Test major compaction with dictionary include for struct of array type`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3022#discussion_r244086110
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java ---
    @@ -356,9 +359,20 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa
             .getColumnSchemaList(carbonTable.getDimensionByTableName(tableName),
                 carbonTable.getMeasureByTableName(tableName));
         carbonFactDataHandlerModel.setWrapperColumnSchema(wrapperColumnSchema);
    -    // get the cardinality for all all the columns including no dictionary columns
    -    int[] formattedCardinality = CarbonUtil
    -        .getFormattedCardinality(segmentProperties.getDimColumnsCardinality(), wrapperColumnSchema);
    +    // get the cardinality for all all the columns including no
    +    // dictionary columns and complex columns
    +    int[] dimAndComplexColumnCardinality =
    +        new int[segmentProperties.getDimColumnsCardinality().length + segmentProperties
    +            .getComplexDimColumnCardinality().length];
    +    for (int i = 0; i < segmentProperties.getDimColumnsCardinality().length; i++) {
    +      dimAndComplexColumnCardinality[i] = segmentProperties.getDimColumnsCardinality()[i];
    --- End diff --
   
    Check for a resturcture drop column case when few loads are done and then dictionary column is dropped and compaction is triggered. In that case segmentProperties will contain cardinality of dropped column also check finally what is the schema and cardinality written during compaction


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3022: [CARBONDATA-3196] Fixed Compaction for Complex types...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3022
 
    Build Failed  with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10205/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishnalla1994 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3022#discussion_r244118775
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java ---
    @@ -356,9 +359,20 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa
             .getColumnSchemaList(carbonTable.getDimensionByTableName(tableName),
                 carbonTable.getMeasureByTableName(tableName));
         carbonFactDataHandlerModel.setWrapperColumnSchema(wrapperColumnSchema);
    -    // get the cardinality for all all the columns including no dictionary columns
    -    int[] formattedCardinality = CarbonUtil
    -        .getFormattedCardinality(segmentProperties.getDimColumnsCardinality(), wrapperColumnSchema);
    +    // get the cardinality for all all the columns including no
    +    // dictionary columns and complex columns
    +    int[] dimAndComplexColumnCardinality =
    +        new int[segmentProperties.getDimColumnsCardinality().length + segmentProperties
    +            .getComplexDimColumnCardinality().length];
    +    for (int i = 0; i < segmentProperties.getDimColumnsCardinality().length; i++) {
    +      dimAndComplexColumnCardinality[i] = segmentProperties.getDimColumnsCardinality()[i];
    --- End diff --
   
    The restructure case is not handled for complex types compaction. I have raised the JIRA issue and I will handle it. Please find the JIRA link : 'https://issues.apache.org/jira/browse/CARBONDATA-3203' .


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3022: [CARBONDATA-3196] Fixed Compaction for Complex types...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3022
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2227/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3022: [CARBONDATA-3196] [CARBONDATA-3203]Fixed Compaction ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3022
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2257/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3022: [CARBONDATA-3196] [CARBONDATA-3203]Fixed Compaction ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3022
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2051/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3022: [CARBONDATA-3196] [CARBONDATA-3203]Fixed Compaction ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3022
 
    Build Failed  with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10303/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3022: [CARBONDATA-3196] [CARBONDATA-3203]Fixed Compaction ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3022
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2062/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3022: [CARBONDATA-3196] [CARBONDATA-3203]Fixed Compaction ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3022
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2266/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3022: [CARBONDATA-3196] [CARBONDATA-3203]Fixed Compaction ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3022
 
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10315/



---
12