[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

qiuchenjian-2
GitHub user ajantha-bhat opened a pull request:

    https://github.com/apache/carbondata/pull/2559

    [CARBONDATA-2606][Complex DataType Enhancements]Fix Null result if projection column have null primitive column and struct

    Problem:
    In case if the actual value of the primitive data type is null, by PR#2489, we are moving all the null values to the end of the collected row without considering the data type.
   
    Solution:
    Place null in the end of output iff the null value is of complex primitive column.
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed? NA
     
     - [ ] Any backward compatibility impacted?NA
     
     - [ ] Document update required?NA
   
     - [ ] Testing done
         updated UT
   
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajantha-bhat/carbondata master_doc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2559
   
----
commit 7af27b66fb1491f5ac7f9bf155723289d39ad7b0
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-07-25T13:51:02Z

    [CARBONDATA-2606][Complex DataType Enhancements] Fix Null result if projection column have null primitive column and struct

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2559
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7509/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2559
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6263/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2559
 
    retest sdv please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2559
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6008/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2559
 
    @kumarvishal09 : PR is ready please review


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2559
 
    @ravipesala : please review this


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2559#discussion_r205975710
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
           }
           fillMeasureData(scannedResult, row);
           if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
    --- End diff --
   
    This check is not good in terms of performance,  every time generating a string from map and check contains is heavy operation. please simplify it. And also I don't think we require to check it for every row.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2559#discussion_r205975748
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
           }
           fillMeasureData(scannedResult, row);
           if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
    -        // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
    -        // projection list,then object array will contain a,null,d as result, because for a.b,
    -        // a will be filled and for a.c null will be placed.
    -        // Instead place null in the end of object array and send a,d,null as result.
    -        int count = 0;
    -        for (int j = 0; j < row.length; j++) {
    -          if (row[j] != null) row[count++] = row[j];
    +        boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length];
    --- End diff --
   
    Does it really need to create this array for each row? I feel it is one time array fill is enough


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2559#discussion_r205975814
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
           }
           fillMeasureData(scannedResult, row);
           if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
    -        // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
    -        // projection list,then object array will contain a,null,d as result, because for a.b,
    -        // a will be filled and for a.c null will be placed.
    -        // Instead place null in the end of object array and send a,d,null as result.
    -        int count = 0;
    -        for (int j = 0; j < row.length; j++) {
    -          if (row[j] != null) row[count++] = row[j];
    +        boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length];
    +        for (ProjectionDimension dimension : queryDimensions) {
    +          if (null != dimension.getDimension().getComplexParentDimension()) {
    +            isComplexChildColumn[dimension.getOrdinal()] = true;
    +          }
             }
    -        while (count < row.length) row[count++] = null;
    +        shiftNullForStruct(row, isComplexChildColumn);
           }
           listBasedResult.add(row);
           rowCounter++;
         }
         return listBasedResult;
       }
     
    +  /**
    +   * shift the complex column null to the end
    +   *
    +   * @param row
    +   * @param isComplexChildColumn
    +   */
    +  private void shiftNullForStruct(Object[] row, boolean[] isComplexChildColumn) {
    +    int count = 0;
    +    int dataTypeCount = 0;
    +    // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
    +    // projection list,then object array will contain a,null,d as result, because for a.b,
    +    // a will be filled and for a.c null will be placed.
    +    // Instead place null in the end of object array and send a,d,null as result.
    +    for (int j = 0; j < row.length; j++) {
    +      if (null == row[j] && !isComplexChildColumn[j]) {
    +        // if it is a primitive column, don't shift the null to the end.
    +        row[count++] = null;
    +        isComplexChildColumn[dataTypeCount++] = false;
    +      } else if (null != row[j]) {
    +        row[count++] = row[j];
    +        isComplexChildColumn[dataTypeCount++] = isComplexChildColumn[j];
    +      }
    +    }
    +    // fill the skipped content
    +    while (count < row.length) row[count++] = null;
    +    while (dataTypeCount < isComplexChildColumn.length) {
    +      isComplexChildColumn[dataTypeCount++] = true;
    --- End diff --
   
    Why this filling of array is required here? are you using this array in caller.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2559#discussion_r206019255
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
           }
           fillMeasureData(scannedResult, row);
           if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
    -        // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
    -        // projection list,then object array will contain a,null,d as result, because for a.b,
    -        // a will be filled and for a.c null will be placed.
    -        // Instead place null in the end of object array and send a,d,null as result.
    -        int count = 0;
    -        for (int j = 0; j < row.length; j++) {
    -          if (row[j] != null) row[count++] = row[j];
    +        boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length];
    +        for (ProjectionDimension dimension : queryDimensions) {
    +          if (null != dimension.getDimension().getComplexParentDimension()) {
    +            isComplexChildColumn[dimension.getOrdinal()] = true;
    +          }
             }
    -        while (count < row.length) row[count++] = null;
    +        shiftNullForStruct(row, isComplexChildColumn);
           }
           listBasedResult.add(row);
           rowCounter++;
         }
         return listBasedResult;
       }
     
    +  /**
    +   * shift the complex column null to the end
    +   *
    +   * @param row
    +   * @param isComplexChildColumn
    +   */
    +  private void shiftNullForStruct(Object[] row, boolean[] isComplexChildColumn) {
    +    int count = 0;
    +    int dataTypeCount = 0;
    +    // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
    +    // projection list,then object array will contain a,null,d as result, because for a.b,
    +    // a will be filled and for a.c null will be placed.
    +    // Instead place null in the end of object array and send a,d,null as result.
    +    for (int j = 0; j < row.length; j++) {
    +      if (null == row[j] && !isComplexChildColumn[j]) {
    +        // if it is a primitive column, don't shift the null to the end.
    +        row[count++] = null;
    +        isComplexChildColumn[dataTypeCount++] = false;
    +      } else if (null != row[j]) {
    +        row[count++] = row[j];
    +        isComplexChildColumn[dataTypeCount++] = isComplexChildColumn[j];
    +      }
    +    }
    +    // fill the skipped content
    +    while (count < row.length) row[count++] = null;
    +    while (dataTypeCount < isComplexChildColumn.length) {
    +      isComplexChildColumn[dataTypeCount++] = true;
    --- End diff --
   
    No need. removed


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2559#discussion_r206019282
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
           }
           fillMeasureData(scannedResult, row);
           if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
    -        // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
    -        // projection list,then object array will contain a,null,d as result, because for a.b,
    -        // a will be filled and for a.c null will be placed.
    -        // Instead place null in the end of object array and send a,d,null as result.
    -        int count = 0;
    -        for (int j = 0; j < row.length; j++) {
    -          if (row[j] != null) row[count++] = row[j];
    +        boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length];
    --- End diff --
   
    one time is enough


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2559#discussion_r206019351
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
           }
           fillMeasureData(scannedResult, row);
           if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
    --- End diff --
   
    yes, avoided string conversion.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2559
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7597/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2559
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6059/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2559
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2559
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7604/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user brijoobopanna commented on the issue:

    https://github.com/apache/carbondata/pull/2559
 
    retest this please



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user brijoobopanna commented on the issue:

    https://github.com/apache/carbondata/pull/2559
 
    retest sdv please



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2559
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7642/



---
12