Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

Classic

List

28 messages Options

Options

12

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

GitHub user ajantha-bhat opened a pull request:

https://github.com/apache/carbondata/pull/2559

[CARBONDATA-2606][Complex DataType Enhancements]Fix Null result if projection column have null primitive column and struct

Problem:
In case if the actual value of the primitive data type is null, by PR#2489, we are moving all the null values to the end of the collected row without considering the data type.

Solution:
Place null in the end of output iff the null value is of complex primitive column.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed? NA

- [ ] Any backward compatibility impacted?NA

- [ ] Document update required?NA

- [ ] Testing done
updated UT

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajantha-bhat/carbondata master_doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2559

----
commit 7af27b66fb1491f5ac7f9bf155723289d39ad7b0
Author: ajantha-bhat <ajanthabhat@...>
Date: 2018-07-25T13:51:02Z

[CARBONDATA-2606][Complex DataType Enhancements] Fix Null result if projection column have null primitive column and struct

----

---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2559

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7509/

---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2559

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6263/

---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2559

retest sdv please

---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2559

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6008/

---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2559

@kumarvishal09 : PR is ready please review

---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2559

@ravipesala : please review this

---

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2559#discussion_r205975710

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
@@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
}
fillMeasureData(scannedResult, row);
if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
--- End diff --

This check is not good in terms of performance, every time generating a string from map and check contains is heavy operation. please simplify it. And also I don't think we require to check it for every row.

---

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2559#discussion_r205975748

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
@@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
}
fillMeasureData(scannedResult, row);
if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
- // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
- // projection list,then object array will contain a,null,d as result, because for a.b,
- // a will be filled and for a.c null will be placed.
- // Instead place null in the end of object array and send a,d,null as result.
- int count = 0;
- for (int j = 0; j < row.length; j++) {
- if (row[j] != null) row[count++] = row[j];
+ boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length];
--- End diff --

Does it really need to create this array for each row? I feel it is one time array fill is enough

---

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2559#discussion_r205975814

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
@@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
}
fillMeasureData(scannedResult, row);
if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
- // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
- // projection list,then object array will contain a,null,d as result, because for a.b,
- // a will be filled and for a.c null will be placed.
- // Instead place null in the end of object array and send a,d,null as result.
- int count = 0;
- for (int j = 0; j < row.length; j++) {
- if (row[j] != null) row[count++] = row[j];
+ boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length];
+ for (ProjectionDimension dimension : queryDimensions) {
+ if (null != dimension.getDimension().getComplexParentDimension()) {
+ isComplexChildColumn[dimension.getOrdinal()] = true;
+ }
}
- while (count < row.length) row[count++] = null;
+ shiftNullForStruct(row, isComplexChildColumn);
}
listBasedResult.add(row);
rowCounter++;
}
return listBasedResult;
}

+ /**
+ * shift the complex column null to the end
+ *
+ * @param row
+ * @param isComplexChildColumn
+ */
+ private void shiftNullForStruct(Object[] row, boolean[] isComplexChildColumn) {
+ int count = 0;
+ int dataTypeCount = 0;
+ // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
+ // projection list,then object array will contain a,null,d as result, because for a.b,
+ // a will be filled and for a.c null will be placed.
+ // Instead place null in the end of object array and send a,d,null as result.
+ for (int j = 0; j < row.length; j++) {
+ if (null == row[j] && !isComplexChildColumn[j]) {
+ // if it is a primitive column, don't shift the null to the end.
+ row[count++] = null;
+ isComplexChildColumn[dataTypeCount++] = false;
+ } else if (null != row[j]) {
+ row[count++] = row[j];
+ isComplexChildColumn[dataTypeCount++] = isComplexChildColumn[j];
+ }
+ }
+ // fill the skipped content
+ while (count < row.length) row[count++] = null;
+ while (dataTypeCount < isComplexChildColumn.length) {
+ isComplexChildColumn[dataTypeCount++] = true;
--- End diff --

Why this filling of array is required here? are you using this array in caller.

---

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2559#discussion_r206019255

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
@@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
}
fillMeasureData(scannedResult, row);
if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
- // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
- // projection list,then object array will contain a,null,d as result, because for a.b,
- // a will be filled and for a.c null will be placed.
- // Instead place null in the end of object array and send a,d,null as result.
- int count = 0;
- for (int j = 0; j < row.length; j++) {
- if (row[j] != null) row[count++] = row[j];
+ boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length];
+ for (ProjectionDimension dimension : queryDimensions) {
+ if (null != dimension.getDimension().getComplexParentDimension()) {
+ isComplexChildColumn[dimension.getOrdinal()] = true;
+ }
}
- while (count < row.length) row[count++] = null;
+ shiftNullForStruct(row, isComplexChildColumn);
}
listBasedResult.add(row);
rowCounter++;
}
return listBasedResult;
}

+ /**
+ * shift the complex column null to the end
+ *
+ * @param row
+ * @param isComplexChildColumn
+ */
+ private void shiftNullForStruct(Object[] row, boolean[] isComplexChildColumn) {
+ int count = 0;
+ int dataTypeCount = 0;
+ // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
+ // projection list,then object array will contain a,null,d as result, because for a.b,
+ // a will be filled and for a.c null will be placed.
+ // Instead place null in the end of object array and send a,d,null as result.
+ for (int j = 0; j < row.length; j++) {
+ if (null == row[j] && !isComplexChildColumn[j]) {
+ // if it is a primitive column, don't shift the null to the end.
+ row[count++] = null;
+ isComplexChildColumn[dataTypeCount++] = false;
+ } else if (null != row[j]) {
+ row[count++] = row[j];
+ isComplexChildColumn[dataTypeCount++] = isComplexChildColumn[j];
+ }
+ }
+ // fill the skipped content
+ while (count < row.length) row[count++] = null;
+ while (dataTypeCount < isComplexChildColumn.length) {
+ isComplexChildColumn[dataTypeCount++] = true;
--- End diff --

No need. removed

---

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2559#discussion_r206019282

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
@@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
}
fillMeasureData(scannedResult, row);
if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
- // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
- // projection list,then object array will contain a,null,d as result, because for a.b,
- // a will be filled and for a.c null will be placed.
- // Instead place null in the end of object array and send a,d,null as result.
- int count = 0;
- for (int j = 0; j < row.length; j++) {
- if (row[j] != null) row[count++] = row[j];
+ boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length];
--- End diff --

one time is enough

---

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2559#discussion_r206019351

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
@@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
}
fillMeasureData(scannedResult, row);
if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
--- End diff --

yes, avoided string conversion.

---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2559

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7597/

---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2559

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6059/

---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2559

retest this please

---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2559

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7604/

---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

In reply to this post by qiuchenjian-2

Github user brijoobopanna commented on the issue:

https://github.com/apache/carbondata/pull/2559

retest this please

---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

In reply to this post by qiuchenjian-2

Github user brijoobopanna commented on the issue:

https://github.com/apache/carbondata/pull/2559

retest sdv please

---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2559

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7642/

---

12