GitHub user ajantha-bhat opened a pull request:
https://github.com/apache/carbondata/pull/2559 [CARBONDATA-2606][Complex DataType Enhancements]Fix Null result if projection column have null primitive column and struct Problem: In case if the actual value of the primitive data type is null, by PR#2489, we are moving all the null values to the end of the collected row without considering the data type. Solution: Place null in the end of output iff the null value is of complex primitive column. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted?NA - [ ] Document update required?NA - [ ] Testing done updated UT - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata master_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2559.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2559 ---- commit 7af27b66fb1491f5ac7f9bf155723289d39ad7b0 Author: ajantha-bhat <ajanthabhat@...> Date: 2018-07-25T13:51:02Z [CARBONDATA-2606][Complex DataType Enhancements] Fix Null result if projection column have null primitive column and struct ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2559 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7509/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2559 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6263/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2559 retest sdv please --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2559 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6008/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2559 @kumarvishal09 : PR is ready please review --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2559 @ravipesala : please review this --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2559#discussion_r205975710 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) { } fillMeasureData(scannedResult, row); if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) { --- End diff -- This check is not good in terms of performance, every time generating a string from map and check contains is heavy operation. please simplify it. And also I don't think we require to check it for every row. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2559#discussion_r205975748 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) { } fillMeasureData(scannedResult, row); if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) { - // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the - // projection list,then object array will contain a,null,d as result, because for a.b, - // a will be filled and for a.c null will be placed. - // Instead place null in the end of object array and send a,d,null as result. - int count = 0; - for (int j = 0; j < row.length; j++) { - if (row[j] != null) row[count++] = row[j]; + boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length]; --- End diff -- Does it really need to create this array for each row? I feel it is one time array fill is enough --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2559#discussion_r205975814 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) { } fillMeasureData(scannedResult, row); if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) { - // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the - // projection list,then object array will contain a,null,d as result, because for a.b, - // a will be filled and for a.c null will be placed. - // Instead place null in the end of object array and send a,d,null as result. - int count = 0; - for (int j = 0; j < row.length; j++) { - if (row[j] != null) row[count++] = row[j]; + boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length]; + for (ProjectionDimension dimension : queryDimensions) { + if (null != dimension.getDimension().getComplexParentDimension()) { + isComplexChildColumn[dimension.getOrdinal()] = true; + } } - while (count < row.length) row[count++] = null; + shiftNullForStruct(row, isComplexChildColumn); } listBasedResult.add(row); rowCounter++; } return listBasedResult; } + /** + * shift the complex column null to the end + * + * @param row + * @param isComplexChildColumn + */ + private void shiftNullForStruct(Object[] row, boolean[] isComplexChildColumn) { + int count = 0; + int dataTypeCount = 0; + // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the + // projection list,then object array will contain a,null,d as result, because for a.b, + // a will be filled and for a.c null will be placed. + // Instead place null in the end of object array and send a,d,null as result. + for (int j = 0; j < row.length; j++) { + if (null == row[j] && !isComplexChildColumn[j]) { + // if it is a primitive column, don't shift the null to the end. + row[count++] = null; + isComplexChildColumn[dataTypeCount++] = false; + } else if (null != row[j]) { + row[count++] = row[j]; + isComplexChildColumn[dataTypeCount++] = isComplexChildColumn[j]; + } + } + // fill the skipped content + while (count < row.length) row[count++] = null; + while (dataTypeCount < isComplexChildColumn.length) { + isComplexChildColumn[dataTypeCount++] = true; --- End diff -- Why this filling of array is required here? are you using this array in caller. --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2559#discussion_r206019255 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) { } fillMeasureData(scannedResult, row); if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) { - // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the - // projection list,then object array will contain a,null,d as result, because for a.b, - // a will be filled and for a.c null will be placed. - // Instead place null in the end of object array and send a,d,null as result. - int count = 0; - for (int j = 0; j < row.length; j++) { - if (row[j] != null) row[count++] = row[j]; + boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length]; + for (ProjectionDimension dimension : queryDimensions) { + if (null != dimension.getDimension().getComplexParentDimension()) { + isComplexChildColumn[dimension.getOrdinal()] = true; + } } - while (count < row.length) row[count++] = null; + shiftNullForStruct(row, isComplexChildColumn); } listBasedResult.add(row); rowCounter++; } return listBasedResult; } + /** + * shift the complex column null to the end + * + * @param row + * @param isComplexChildColumn + */ + private void shiftNullForStruct(Object[] row, boolean[] isComplexChildColumn) { + int count = 0; + int dataTypeCount = 0; + // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the + // projection list,then object array will contain a,null,d as result, because for a.b, + // a will be filled and for a.c null will be placed. + // Instead place null in the end of object array and send a,d,null as result. + for (int j = 0; j < row.length; j++) { + if (null == row[j] && !isComplexChildColumn[j]) { + // if it is a primitive column, don't shift the null to the end. + row[count++] = null; + isComplexChildColumn[dataTypeCount++] = false; + } else if (null != row[j]) { + row[count++] = row[j]; + isComplexChildColumn[dataTypeCount++] = isComplexChildColumn[j]; + } + } + // fill the skipped content + while (count < row.length) row[count++] = null; + while (dataTypeCount < isComplexChildColumn.length) { + isComplexChildColumn[dataTypeCount++] = true; --- End diff -- No need. removed --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2559#discussion_r206019282 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) { } fillMeasureData(scannedResult, row); if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) { - // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the - // projection list,then object array will contain a,null,d as result, because for a.b, - // a will be filled and for a.c null will be placed. - // Instead place null in the end of object array and send a,d,null as result. - int count = 0; - for (int j = 0; j < row.length; j++) { - if (row[j] != null) row[count++] = row[j]; + boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length]; --- End diff -- one time is enough --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2559#discussion_r206019351 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) { } fillMeasureData(scannedResult, row); if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) { --- End diff -- yes, avoided string conversion. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2559 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7597/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2559 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6059/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2559 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2559 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7604/ --- |
In reply to this post by qiuchenjian-2
Github user brijoobopanna commented on the issue:
https://github.com/apache/carbondata/pull/2559 retest this please --- |
In reply to this post by qiuchenjian-2
Github user brijoobopanna commented on the issue:
https://github.com/apache/carbondata/pull/2559 retest sdv please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2559 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7642/ --- |
Free forum by Nabble | Edit this page |