Apache CarbonData Dev Mailing List archive

Login Register

[GitHub] [carbondata] ShreelekhyaG opened a new pull request #4142: [WIP] Fix compaction failure after alter add complex column.

Classic

List

44 messages Options

Options

123

GitBox

May 26, 2021; 2:12pm

[GitHub] [carbondata] ShreelekhyaG opened a new pull request #4142: [WIP] Fix compaction failure after alter add complex column.

31718 posts

ShreelekhyaG opened a new pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142

### Why is this PR needed?
When we perform compaction after alter add a complex column, the query fails with `ArrayIndexOutOfBounds `exception.

### What changes were proposed in this PR?
While restructuring row, added changes to fill complex keys with default values according to the latest schema.

### Does this PR introduce any user interface change?
- No

### Is any new testcase added?
- Yes

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

May 26, 2021; 3:44pm

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4142: [WIP] Fix compaction failure after alter add complex column.

31718 posts

CarbonDataQA2 commented on pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#issuecomment-848885515

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3698/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

May 26, 2021; 3:47pm

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4142: [WIP] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#issuecomment-848887406

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5443/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

May 27, 2021; 5:06pm

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4142: [WIP] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#issuecomment-849796687

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5444/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

May 27, 2021; 5:07pm

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4142: [WIP] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#issuecomment-849797332

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3699/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 01, 2021; 1:32pm

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

akashrn5 commented on a change in pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#discussion_r643095996

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/RestructureBasedRawResultCollector.java
##########
@@ -111,36 +123,72 @@ private void fillDictionaryKeyArrayBatchWithLatestSchema(List<Object[]> rows) {
}

/**
- * This method will fill the no dictionary byte array with newly added no dictionary columns
+ * This method will fill the no dictionary and complex byte array with newly added columns
*
* @param rows
* @return
*/
- private void fillNoDictionaryKeyArrayBatchWithLatestSchema(List<Object[]> rows) {
+ private void fillNoDictionaryAndComplexKeyArrayBatchWithLatestSchema(List<Object[]> rows) {
for (Object[] row : rows) {
ByteArrayWrapper byteArrayWrapper = (ByteArrayWrapper) row[0];
byte[][] noDictKeyArray = byteArrayWrapper.getNoDictionaryKeys();
+ byte[][] complexTypesKeyArray = byteArrayWrapper.getComplexTypesKeys();
ProjectionDimension[] actualQueryDimensions = executionInfo.getActualQueryDimensions();
byte[][] noDictionaryKeyArrayWithNewlyAddedColumns =
new byte[noDictKeyArray.length + dimensionInfo.getNewNoDictionaryColumnCount()][];
+ byte[][] complexTypeKeyArrayWithNewlyAddedColumns =
+ new byte[complexTypesKeyArray.length + dimensionInfo.getNewComplexColumnCount()][];
int existingColumnValueIndex = 0;
int newKeyArrayIndex = 0;
+ int existingComplexColumnValueIndex = 0;
+ int newComplexKeyArrayIndex = 0;
for (int i = 0; i < dimensionInfo.getDimensionExists().length; i++) {
if (actualQueryDimensions[i].getDimension().getDataType() != DataTypes.DATE
&& !actualQueryDimensions[i].getDimension().hasEncoding(Encoding.IMPLICIT)) {
+ DataType currDataType = actualQueryDimensions[i].getDimension().getDataType();
// if dimension exists then add the byte array value else add the default value
if (dimensionInfo.getDimensionExists()[i]) {
- noDictionaryKeyArrayWithNewlyAddedColumns[newKeyArrayIndex++] =
- noDictKeyArray[existingColumnValueIndex++];
+ if (currDataType.isComplexType()) {
+ complexTypeKeyArrayWithNewlyAddedColumns[newComplexKeyArrayIndex++] =
+ complexTypesKeyArray[existingComplexColumnValueIndex++];
+ } else {
+ noDictionaryKeyArrayWithNewlyAddedColumns[newKeyArrayIndex++] =
+ noDictKeyArray[existingColumnValueIndex++];
+ }
} else {
byte[] newColumnDefaultValue = null;
Object defaultValue = dimensionInfo.getDefaultValues()[i];
if (null != defaultValue) {
newColumnDefaultValue = (byte[]) defaultValue;
- } else if (actualQueryDimensions[i].getDimension().getDataType() == DataTypes.STRING) {
+ } else if (currDataType == DataTypes.STRING) {
newColumnDefaultValue =
DataTypeUtil.getDataTypeConverter().convertFromByteToUTF8Bytes(
CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY);
+ } else if (currDataType.isComplexType()) {
+ // Iterate over child dimensions and add its default value.
+ List<CarbonDimension> children =
+ actualQueryDimensions[i].getDimension().getListOfChildDimensions();
+ ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
+ DataOutputStream dataOutputStream = new DataOutputStream(byteStream);
+ try {
+ if (DataTypes.isArrayType(currDataType)) {
+ dataOutputStream.writeInt(1);
+ } else if (DataTypes.isStructType(currDataType)) {
+ dataOutputStream.writeShort(children.size());
+ }
+ for (int j = 0; j < children.size(); j++) {
+ // update default null values based on datatype
+ updateNullValue(dataOutputStream, children.get(j).getDataType());
+ }
+ dataOutputStream.close();
+ } catch (IOException e) {
+ LOGGER.error(e.getMessage(), e);
+ e.printStackTrace();

Review comment:
please remove the printStackTrace

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/RestructureBasedRawResultCollector.java
##########
@@ -111,36 +123,72 @@ private void fillDictionaryKeyArrayBatchWithLatestSchema(List<Object[]> rows) {
}

/**
- * This method will fill the no dictionary byte array with newly added no dictionary columns
+ * This method will fill the no dictionary and complex byte array with newly added columns
*
* @param rows
* @return
*/
- private void fillNoDictionaryKeyArrayBatchWithLatestSchema(List<Object[]> rows) {
+ private void fillNoDictionaryAndComplexKeyArrayBatchWithLatestSchema(List<Object[]> rows) {
for (Object[] row : rows) {
ByteArrayWrapper byteArrayWrapper = (ByteArrayWrapper) row[0];
byte[][] noDictKeyArray = byteArrayWrapper.getNoDictionaryKeys();
+ byte[][] complexTypesKeyArray = byteArrayWrapper.getComplexTypesKeys();
ProjectionDimension[] actualQueryDimensions = executionInfo.getActualQueryDimensions();
byte[][] noDictionaryKeyArrayWithNewlyAddedColumns =
new byte[noDictKeyArray.length + dimensionInfo.getNewNoDictionaryColumnCount()][];
+ byte[][] complexTypeKeyArrayWithNewlyAddedColumns =
+ new byte[complexTypesKeyArray.length + dimensionInfo.getNewComplexColumnCount()][];
int existingColumnValueIndex = 0;
int newKeyArrayIndex = 0;
+ int existingComplexColumnValueIndex = 0;
+ int newComplexKeyArrayIndex = 0;
for (int i = 0; i < dimensionInfo.getDimensionExists().length; i++) {
if (actualQueryDimensions[i].getDimension().getDataType() != DataTypes.DATE
&& !actualQueryDimensions[i].getDimension().hasEncoding(Encoding.IMPLICIT)) {
+ DataType currDataType = actualQueryDimensions[i].getDimension().getDataType();
// if dimension exists then add the byte array value else add the default value
if (dimensionInfo.getDimensionExists()[i]) {
- noDictionaryKeyArrayWithNewlyAddedColumns[newKeyArrayIndex++] =
- noDictKeyArray[existingColumnValueIndex++];
+ if (currDataType.isComplexType()) {
+ complexTypeKeyArrayWithNewlyAddedColumns[newComplexKeyArrayIndex++] =
+ complexTypesKeyArray[existingComplexColumnValueIndex++];
+ } else {
+ noDictionaryKeyArrayWithNewlyAddedColumns[newKeyArrayIndex++] =
+ noDictKeyArray[existingColumnValueIndex++];
+ }
} else {
byte[] newColumnDefaultValue = null;
Object defaultValue = dimensionInfo.getDefaultValues()[i];
if (null != defaultValue) {
newColumnDefaultValue = (byte[]) defaultValue;
- } else if (actualQueryDimensions[i].getDimension().getDataType() == DataTypes.STRING) {
+ } else if (currDataType == DataTypes.STRING) {
newColumnDefaultValue =
DataTypeUtil.getDataTypeConverter().convertFromByteToUTF8Bytes(
CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY);
+ } else if (currDataType.isComplexType()) {
+ // Iterate over child dimensions and add its default value.
+ List<CarbonDimension> children =
+ actualQueryDimensions[i].getDimension().getListOfChildDimensions();
+ ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
+ DataOutputStream dataOutputStream = new DataOutputStream(byteStream);
+ try {
+ if (DataTypes.isArrayType(currDataType)) {
+ dataOutputStream.writeInt(1);
+ } else if (DataTypes.isStructType(currDataType)) {
+ dataOutputStream.writeShort(children.size());
+ }
+ for (int j = 0; j < children.size(); j++) {
+ // update default null values based on datatype
+ updateNullValue(dataOutputStream, children.get(j).getDataType());
+ }
+ dataOutputStream.close();
+ } catch (IOException e) {
+ LOGGER.error(e.getMessage(), e);
+ e.printStackTrace();
+ }
+ newColumnDefaultValue = byteStream.toByteArray();

Review comment:
please close all the streams after use, either use try with resource or close in finally block

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/executor/util/RestructureUtil.java
##########
@@ -140,6 +141,9 @@
if (queryDimension.getDimension().getDataType() == DataTypes.DATE) {
dimensionInfo.setDictionaryColumnAdded(true);
newDictionaryColumnCount++;
+ } else if (queryDimension.getDimension().getDataType().isComplexType()) {
+ dimensionInfo.setComplexColumnAdded(true);
+ newComplexDictColumnCount++;

Review comment:
can rename to no dict column count

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/RestructureBasedRawResultCollector.java
##########
@@ -149,6 +197,27 @@ private void fillNoDictionaryKeyArrayBatchWithLatestSchema(List<Object[]> rows)
}
}
byteArrayWrapper.setNoDictionaryKeys(noDictionaryKeyArrayWithNewlyAddedColumns);
+ byteArrayWrapper.setComplexTypesKeys(complexTypeKeyArrayWithNewlyAddedColumns);
+ }
+ }
+
+ private void updateNullValue(DataOutputStream dataOutputStream, DataType dimensionDataType)
+ throws IOException {
+ if (dimensionDataType == DataTypes.STRING

Review comment:
please refactor this code and put in some core module util and use in both core and processing module

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 01, 2021; 1:33pm

[GitHub] [carbondata] akashrn5 commented on pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

akashrn5 commented on pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#issuecomment-852128222

@ShreelekhyaG please check for `IndexRebuildRDD `for `RowIdRestructureBasedRawResultCollector `and see if its handled the scenario

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 01, 2021; 3:19pm

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

akashrn5 commented on a change in pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#discussion_r643204703

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/alterTable/TestAlterTableAddColumns.scala
##########
@@ -214,6 +214,19 @@ class TestAlterTableAddColumns extends QueryTest with BeforeAndAfterAll {
sql("DROP TABLE IF EXISTS alter_com")
}

+ test("Test alter add complex type and compaction") {
+ sql("DROP TABLE IF EXISTS alter_com")
+ sql("create table alter_com (a int, b string, arr1 array<string>) stored as carbondata")
+ sql("insert into alter_com select 1,'a',array('hi')")
+ sql("insert into alter_com select 2,'b',array('hello','world')")
+ sql("ALTER TABLE alter_com ADD COLUMNS(struct1 STRUCT<a:int, b:string>)")
+ sql("insert into alter_com select 3,'c',array('hi'),named_struct('s1',4,'s2','d')")
+ sql("insert into alter_com select 4,'d',array('hi'),named_struct('s1',4,'s2','d')")
+ sql("alter table alter_com compact 'minor'")
+ checkAnswer(sql("""Select count(*) from alter_com"""), Seq(Row(4)))

Review comment:
please add a check for data also, and add test case with custom default values

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 01, 2021; 3:19pm

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

akashrn5 commented on a change in pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#discussion_r643205012

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/alterTable/TestAlterTableAddColumns.scala
##########
@@ -214,6 +214,19 @@ class TestAlterTableAddColumns extends QueryTest with BeforeAndAfterAll {
sql("DROP TABLE IF EXISTS alter_com")
}

+ test("Test alter add complex type and compaction") {
+ sql("DROP TABLE IF EXISTS alter_com")
+ sql("create table alter_com (a int, b string, arr1 array<string>) stored as carbondata")
+ sql("insert into alter_com select 1,'a',array('hi')")
+ sql("insert into alter_com select 2,'b',array('hello','world')")
+ sql("ALTER TABLE alter_com ADD COLUMNS(struct1 STRUCT<a:int, b:string>)")
+ sql("insert into alter_com select 3,'c',array('hi'),named_struct('s1',4,'s2','d')")
+ sql("insert into alter_com select 4,'d',array('hi'),named_struct('s1',4,'s2','d')")
+ sql("alter table alter_com compact 'minor'")
+ checkAnswer(sql("""Select count(*) from alter_com"""), Seq(Row(4)))
+ sql("DROP TABLE IF EXISTS alter_com")
+ }
+

Review comment:
please add a test case with SI as code changes are with SI

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 01, 2021; 3:20pm

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

akashrn5 commented on a change in pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#discussion_r643205770

##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -293,26 +300,32 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
// dictionary
preparedRow[i] = wrapper.getDictionaryKeyByIndex(dictionaryIndex++);
} else {
- if (isComplexColumn) {
- // get the flattened data of complex column
- byte[] complexKeyByIndex = wrapper.getComplexKeyByIndex(complexIndex);
- ByteBuffer byteArrayInput = ByteBuffer.wrap(complexKeyByIndex);
- GenericQueryType genericQueryType =
- complexDimensionInfoMap.get(complexColumnParentBlockIndexes[complexIndex++]);
- int complexDataLength = byteArrayInput.getShort(2);
- // In case, if array is empty
- if (complexDataLength == 0) {
- complexDataLength = complexDataLength + 1;
- }
- // get flattened array data
- Object[] complexFlattenedData = new Object[complexDataLength];
- Object[] data = genericQueryType.getObjectArrayDataBasedOnDataType(byteArrayInput);
- for (int index = 0; index < complexDataLength; index++) {
- complexFlattenedData[index] =
- getData(data, index, dims.getColumnSchema().getDataType());
+ if (isComplexColumn || isComplexColumnAdded) {
+ if (complexColumnParentBlockIndexes.length > 0) {
+ // get the flattened data of complex column
+ byte[] complexKeyByIndex = wrapper.getComplexKeyByIndex(complexIndex);
+ ByteBuffer byteArrayInput = ByteBuffer.wrap(complexKeyByIndex);
+ GenericQueryType genericQueryType =
+ complexDimensionInfoMap.get(complexColumnParentBlockIndexes[complexIndex++]);
+ int complexDataLength = byteArrayInput.getShort(2);
+ // In case, if array is empty
+ if (complexDataLength == 0) {
+ complexDataLength = complexDataLength + 1;
+ }
+ // get flattened array data
+ Object[] complexFlattenedData = new Object[complexDataLength];
+ Object[] data = genericQueryType.getObjectArrayDataBasedOnDataType(byteArrayInput);
+ for (int index = 0; index < complexDataLength; index++) {
+ complexFlattenedData[index] =
+ getData(data, index, dims.getColumnSchema().getDataType());
+ }
+ // store the dimesnion column index and the complex column flattened data to a map
+ complexDataMap.put(i, complexFlattenedData);
+ } else {
+ // when complex column is added after restructuring and
+ // column not present in parent block, let the SI row value be empty.

Review comment:
the comment is not meaningful, please restructure the comment and make it simple and understandable

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 02, 2021; 3:56pm

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#issuecomment-853147040

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3733/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 02, 2021; 3:57pm

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#issuecomment-853147570

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5477/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 03, 2021; 5:44am

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#discussion_r644500344

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/executor/util/RestructureUtil.java
##########
@@ -140,6 +141,9 @@
if (queryDimension.getDimension().getDataType() == DataTypes.DATE) {
dimensionInfo.setDictionaryColumnAdded(true);
newDictionaryColumnCount++;
+ } else if (queryDimension.getDimension().getDataType().isComplexType()) {
+ dimensionInfo.setComplexColumnAdded(true);
+ newComplexDictColumnCount++;

Review comment:
Done

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/RestructureBasedRawResultCollector.java
##########
@@ -111,36 +123,72 @@ private void fillDictionaryKeyArrayBatchWithLatestSchema(List<Object[]> rows) {
}

/**
- * This method will fill the no dictionary byte array with newly added no dictionary columns
+ * This method will fill the no dictionary and complex byte array with newly added columns
*
* @param rows
* @return
*/
- private void fillNoDictionaryKeyArrayBatchWithLatestSchema(List<Object[]> rows) {
+ private void fillNoDictionaryAndComplexKeyArrayBatchWithLatestSchema(List<Object[]> rows) {
for (Object[] row : rows) {
ByteArrayWrapper byteArrayWrapper = (ByteArrayWrapper) row[0];
byte[][] noDictKeyArray = byteArrayWrapper.getNoDictionaryKeys();
+ byte[][] complexTypesKeyArray = byteArrayWrapper.getComplexTypesKeys();
ProjectionDimension[] actualQueryDimensions = executionInfo.getActualQueryDimensions();
byte[][] noDictionaryKeyArrayWithNewlyAddedColumns =
new byte[noDictKeyArray.length + dimensionInfo.getNewNoDictionaryColumnCount()][];
+ byte[][] complexTypeKeyArrayWithNewlyAddedColumns =
+ new byte[complexTypesKeyArray.length + dimensionInfo.getNewComplexColumnCount()][];
int existingColumnValueIndex = 0;
int newKeyArrayIndex = 0;
+ int existingComplexColumnValueIndex = 0;
+ int newComplexKeyArrayIndex = 0;
for (int i = 0; i < dimensionInfo.getDimensionExists().length; i++) {
if (actualQueryDimensions[i].getDimension().getDataType() != DataTypes.DATE
&& !actualQueryDimensions[i].getDimension().hasEncoding(Encoding.IMPLICIT)) {
+ DataType currDataType = actualQueryDimensions[i].getDimension().getDataType();
// if dimension exists then add the byte array value else add the default value
if (dimensionInfo.getDimensionExists()[i]) {
- noDictionaryKeyArrayWithNewlyAddedColumns[newKeyArrayIndex++] =
- noDictKeyArray[existingColumnValueIndex++];
+ if (currDataType.isComplexType()) {
+ complexTypeKeyArrayWithNewlyAddedColumns[newComplexKeyArrayIndex++] =
+ complexTypesKeyArray[existingComplexColumnValueIndex++];
+ } else {
+ noDictionaryKeyArrayWithNewlyAddedColumns[newKeyArrayIndex++] =
+ noDictKeyArray[existingColumnValueIndex++];
+ }
} else {
byte[] newColumnDefaultValue = null;
Object defaultValue = dimensionInfo.getDefaultValues()[i];
if (null != defaultValue) {
newColumnDefaultValue = (byte[]) defaultValue;
- } else if (actualQueryDimensions[i].getDimension().getDataType() == DataTypes.STRING) {
+ } else if (currDataType == DataTypes.STRING) {
newColumnDefaultValue =
DataTypeUtil.getDataTypeConverter().convertFromByteToUTF8Bytes(
CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY);
+ } else if (currDataType.isComplexType()) {
+ // Iterate over child dimensions and add its default value.
+ List<CarbonDimension> children =
+ actualQueryDimensions[i].getDimension().getListOfChildDimensions();
+ ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
+ DataOutputStream dataOutputStream = new DataOutputStream(byteStream);
+ try {
+ if (DataTypes.isArrayType(currDataType)) {
+ dataOutputStream.writeInt(1);
+ } else if (DataTypes.isStructType(currDataType)) {
+ dataOutputStream.writeShort(children.size());
+ }
+ for (int j = 0; j < children.size(); j++) {
+ // update default null values based on datatype
+ updateNullValue(dataOutputStream, children.get(j).getDataType());
+ }
+ dataOutputStream.close();
+ } catch (IOException e) {
+ LOGGER.error(e.getMessage(), e);
+ e.printStackTrace();

Review comment:
ok

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 03, 2021; 5:44am

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#discussion_r644500437

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/RestructureBasedRawResultCollector.java
##########
@@ -111,36 +123,72 @@ private void fillDictionaryKeyArrayBatchWithLatestSchema(List<Object[]> rows) {
}

/**
- * This method will fill the no dictionary byte array with newly added no dictionary columns
+ * This method will fill the no dictionary and complex byte array with newly added columns
*
* @param rows
* @return
*/
- private void fillNoDictionaryKeyArrayBatchWithLatestSchema(List<Object[]> rows) {
+ private void fillNoDictionaryAndComplexKeyArrayBatchWithLatestSchema(List<Object[]> rows) {
for (Object[] row : rows) {
ByteArrayWrapper byteArrayWrapper = (ByteArrayWrapper) row[0];
byte[][] noDictKeyArray = byteArrayWrapper.getNoDictionaryKeys();
+ byte[][] complexTypesKeyArray = byteArrayWrapper.getComplexTypesKeys();
ProjectionDimension[] actualQueryDimensions = executionInfo.getActualQueryDimensions();
byte[][] noDictionaryKeyArrayWithNewlyAddedColumns =
new byte[noDictKeyArray.length + dimensionInfo.getNewNoDictionaryColumnCount()][];
+ byte[][] complexTypeKeyArrayWithNewlyAddedColumns =
+ new byte[complexTypesKeyArray.length + dimensionInfo.getNewComplexColumnCount()][];
int existingColumnValueIndex = 0;
int newKeyArrayIndex = 0;
+ int existingComplexColumnValueIndex = 0;
+ int newComplexKeyArrayIndex = 0;
for (int i = 0; i < dimensionInfo.getDimensionExists().length; i++) {
if (actualQueryDimensions[i].getDimension().getDataType() != DataTypes.DATE
&& !actualQueryDimensions[i].getDimension().hasEncoding(Encoding.IMPLICIT)) {
+ DataType currDataType = actualQueryDimensions[i].getDimension().getDataType();
// if dimension exists then add the byte array value else add the default value
if (dimensionInfo.getDimensionExists()[i]) {
- noDictionaryKeyArrayWithNewlyAddedColumns[newKeyArrayIndex++] =
- noDictKeyArray[existingColumnValueIndex++];
+ if (currDataType.isComplexType()) {
+ complexTypeKeyArrayWithNewlyAddedColumns[newComplexKeyArrayIndex++] =
+ complexTypesKeyArray[existingComplexColumnValueIndex++];
+ } else {
+ noDictionaryKeyArrayWithNewlyAddedColumns[newKeyArrayIndex++] =
+ noDictKeyArray[existingColumnValueIndex++];
+ }
} else {
byte[] newColumnDefaultValue = null;
Object defaultValue = dimensionInfo.getDefaultValues()[i];
if (null != defaultValue) {
newColumnDefaultValue = (byte[]) defaultValue;
- } else if (actualQueryDimensions[i].getDimension().getDataType() == DataTypes.STRING) {
+ } else if (currDataType == DataTypes.STRING) {
newColumnDefaultValue =
DataTypeUtil.getDataTypeConverter().convertFromByteToUTF8Bytes(
CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY);
+ } else if (currDataType.isComplexType()) {
+ // Iterate over child dimensions and add its default value.
+ List<CarbonDimension> children =
+ actualQueryDimensions[i].getDimension().getListOfChildDimensions();
+ ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
+ DataOutputStream dataOutputStream = new DataOutputStream(byteStream);
+ try {
+ if (DataTypes.isArrayType(currDataType)) {
+ dataOutputStream.writeInt(1);
+ } else if (DataTypes.isStructType(currDataType)) {
+ dataOutputStream.writeShort(children.size());
+ }
+ for (int j = 0; j < children.size(); j++) {
+ // update default null values based on datatype
+ updateNullValue(dataOutputStream, children.get(j).getDataType());
+ }
+ dataOutputStream.close();
+ } catch (IOException e) {
+ LOGGER.error(e.getMessage(), e);
+ e.printStackTrace();
+ }
+ newColumnDefaultValue = byteStream.toByteArray();

Review comment:
done

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 03, 2021; 5:44am

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#discussion_r644500520

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/RestructureBasedRawResultCollector.java
##########
@@ -149,6 +197,27 @@ private void fillNoDictionaryKeyArrayBatchWithLatestSchema(List<Object[]> rows)
}
}
byteArrayWrapper.setNoDictionaryKeys(noDictionaryKeyArrayWithNewlyAddedColumns);
+ byteArrayWrapper.setComplexTypesKeys(complexTypeKeyArrayWithNewlyAddedColumns);
+ }
+ }
+
+ private void updateNullValue(DataOutputStream dataOutputStream, DataType dimensionDataType)
+ throws IOException {
+ if (dimensionDataType == DataTypes.STRING

Review comment:
done

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 03, 2021; 5:44am

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#discussion_r644500616

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/alterTable/TestAlterTableAddColumns.scala
##########
@@ -214,6 +214,19 @@ class TestAlterTableAddColumns extends QueryTest with BeforeAndAfterAll {
sql("DROP TABLE IF EXISTS alter_com")
}

+ test("Test alter add complex type and compaction") {
+ sql("DROP TABLE IF EXISTS alter_com")
+ sql("create table alter_com (a int, b string, arr1 array<string>) stored as carbondata")
+ sql("insert into alter_com select 1,'a',array('hi')")
+ sql("insert into alter_com select 2,'b',array('hello','world')")
+ sql("ALTER TABLE alter_com ADD COLUMNS(struct1 STRUCT<a:int, b:string>)")
+ sql("insert into alter_com select 3,'c',array('hi'),named_struct('s1',4,'s2','d')")
+ sql("insert into alter_com select 4,'d',array('hi'),named_struct('s1',4,'s2','d')")
+ sql("alter table alter_com compact 'minor'")
+ checkAnswer(sql("""Select count(*) from alter_com"""), Seq(Row(4)))

Review comment:
done

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 03, 2021; 5:45am

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#discussion_r644500710

##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -293,26 +300,32 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
// dictionary
preparedRow[i] = wrapper.getDictionaryKeyByIndex(dictionaryIndex++);
} else {
- if (isComplexColumn) {
- // get the flattened data of complex column
- byte[] complexKeyByIndex = wrapper.getComplexKeyByIndex(complexIndex);
- ByteBuffer byteArrayInput = ByteBuffer.wrap(complexKeyByIndex);
- GenericQueryType genericQueryType =
- complexDimensionInfoMap.get(complexColumnParentBlockIndexes[complexIndex++]);
- int complexDataLength = byteArrayInput.getShort(2);
- // In case, if array is empty
- if (complexDataLength == 0) {
- complexDataLength = complexDataLength + 1;
- }
- // get flattened array data
- Object[] complexFlattenedData = new Object[complexDataLength];
- Object[] data = genericQueryType.getObjectArrayDataBasedOnDataType(byteArrayInput);
- for (int index = 0; index < complexDataLength; index++) {
- complexFlattenedData[index] =
- getData(data, index, dims.getColumnSchema().getDataType());
+ if (isComplexColumn || isComplexColumnAdded) {
+ if (complexColumnParentBlockIndexes.length > 0) {
+ // get the flattened data of complex column
+ byte[] complexKeyByIndex = wrapper.getComplexKeyByIndex(complexIndex);
+ ByteBuffer byteArrayInput = ByteBuffer.wrap(complexKeyByIndex);
+ GenericQueryType genericQueryType =
+ complexDimensionInfoMap.get(complexColumnParentBlockIndexes[complexIndex++]);
+ int complexDataLength = byteArrayInput.getShort(2);
+ // In case, if array is empty
+ if (complexDataLength == 0) {
+ complexDataLength = complexDataLength + 1;
+ }
+ // get flattened array data
+ Object[] complexFlattenedData = new Object[complexDataLength];
+ Object[] data = genericQueryType.getObjectArrayDataBasedOnDataType(byteArrayInput);
+ for (int index = 0; index < complexDataLength; index++) {
+ complexFlattenedData[index] =
+ getData(data, index, dims.getColumnSchema().getDataType());
+ }
+ // store the dimesnion column index and the complex column flattened data to a map
+ complexDataMap.put(i, complexFlattenedData);
+ } else {
+ // when complex column is added after restructuring and
+ // column not present in parent block, let the SI row value be empty.

Review comment:
ok, edited comment.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 03, 2021; 5:45am

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#discussion_r644500983

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/alterTable/TestAlterTableAddColumns.scala
##########
@@ -214,6 +214,19 @@ class TestAlterTableAddColumns extends QueryTest with BeforeAndAfterAll {
sql("DROP TABLE IF EXISTS alter_com")
}

+ test("Test alter add complex type and compaction") {
+ sql("DROP TABLE IF EXISTS alter_com")
+ sql("create table alter_com (a int, b string, arr1 array<string>) stored as carbondata")
+ sql("insert into alter_com select 1,'a',array('hi')")
+ sql("insert into alter_com select 2,'b',array('hello','world')")
+ sql("ALTER TABLE alter_com ADD COLUMNS(struct1 STRUCT<a:int, b:string>)")
+ sql("insert into alter_com select 3,'c',array('hi'),named_struct('s1',4,'s2','d')")
+ sql("insert into alter_com select 4,'d',array('hi'),named_struct('s1',4,'s2','d')")
+ sql("alter table alter_com compact 'minor'")
+ checkAnswer(sql("""Select count(*) from alter_com"""), Seq(Row(4)))
+ sql("DROP TABLE IF EXISTS alter_com")
+ }
+

Review comment:
Following test case already exists with such a scenario: `Test alter add array column before creating SI`. It was not failing before in CI because added complex column details were present in `noDictionaryKeys` (supposed to be in `complexKeys`) and row is prepared from that empty byte array. Now as I have made changes to set `complexKeys `properly after restructuring, the testcase is failing as it's not able to identify as complex type and trying to prepare row from `noDictionaryKeys` array which is empty. The changes for SI now made in `prepareRowObjectForSorting ` fixes the issue.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 03, 2021; 12:55pm

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#issuecomment-853846024

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5488/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

Jun 03, 2021; 1:59pm

[GitHub] [carbondata] ShreelekhyaG commented on pull request #4142: [CARBONDATA-4193] Fix compaction failure after alter add complex column.

31718 posts

In reply to this post by GitBox

ShreelekhyaG commented on pull request #4142:
URL: https://github.com/apache/carbondata/pull/4142#issuecomment-853891907

> @ShreelekhyaG please check for `IndexRebuildRDD `for `RowIdRestructureBasedRawResultCollector `and see if its handled the scenario.
@akashrn5 `RowIdRestructureBasedRawResultCollector ` would be used when bloom index is created on restructured column. Currently, BloomFilter doesnt support complex datatype and we already have testcase for the scenario when bloom index is created on restructured primitive column.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

123