Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3615: [WIP] Send insert stage, compaction to new insert into flow

Classic

List

46 messages Options

Options

123

GitBox

[GitHub] [carbondata] ajantha-bhat commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

ajantha-bhat commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-589498137

@QiangCai , @kunal642 , @jackylk :PR is ready. please review

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-589501574

Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/378/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-589519466

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2080/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] ajantha-bhat commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

ajantha-bhat commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590150487

retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590153543

Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/414/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590163981

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2115/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] jackylk commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

jackylk commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#discussion_r383369387

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoCommand.scala
##########
@@ -176,8 +176,21 @@ case class CarbonInsertIntoCommand(databaseNameOp: Option[String],
convertedStaticPartition)
scanResultRdd = sparkSession.sessionState.executePlan(newLogicalPlan).toRdd
if (logicalPartitionRelation != null) {
- logicalPartitionRelation =
- getReArrangedSchemaLogicalRelation(reArrangedIndex, logicalPartitionRelation)
+ if (selectedColumnSchema.length != logicalPartitionRelation.output.length) {
+ throw new RuntimeException(" schema length doesn't match partition length")
+ }
+ var isAlreadyReArranged = true
+ var index = 0
+ for (col: ColumnSchema <- selectedColumnSchema) {

Review comment:
Please use lambda function instead of `for` loop, findFirst?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] jackylk commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

jackylk commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#discussion_r383369776

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
##########
@@ -728,7 +728,11 @@ object CommonLoadUtils {
}
val updatedRdd: RDD[InternalRow] = rdd.map { internalRow =>
for (index <- timeStampIndex) {
- internalRow.setLong(index, internalRow.getLong(index) / 1000)
+ if (internalRow.getLong(index) == 0) {
+ internalRow.setNullAt(index)
+ } else {
+ internalRow.setLong(index, internalRow.getLong(index) / 1000)

Review comment:
What does 1000 stands for? It is magic number

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] jackylk commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

jackylk commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#discussion_r383371226

##########
File path: processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java
##########
@@ -99,9 +99,12 @@
noDictDimensionPages = new ColumnPage[model.getNoDictionaryCount()];
int tmpNumDictDimIdx = 0;
int tmpNumNoDictDimIdx = 0;
- for (int i = 0; i < dictDimensionPages.length + noDictDimensionPages.length; i++) {
+ for (int i = 0; i < tableSpec.getNumDimensions(); i++) {
TableSpec.DimensionSpec spec = tableSpec.getDimensionSpec(i);
- ColumnType columnType = tableSpec.getDimensionSpec(i).getColumnType();
+ if (spec.getSchemaDataType().isComplexType()) {
+ // partition columns are placed at the end. so, might present after complex columns

Review comment:
Do you mean to skip all complex column and go to the last dimension?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#discussion_r383377457

##########
File path: processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java
##########
@@ -99,9 +99,12 @@
noDictDimensionPages = new ColumnPage[model.getNoDictionaryCount()];
int tmpNumDictDimIdx = 0;
int tmpNumNoDictDimIdx = 0;
- for (int i = 0; i < dictDimensionPages.length + noDictDimensionPages.length; i++) {
+ for (int i = 0; i < tableSpec.getNumDimensions(); i++) {
TableSpec.DimensionSpec spec = tableSpec.getDimensionSpec(i);
- ColumnType columnType = tableSpec.getDimensionSpec(i).getColumnType();
+ if (spec.getSchemaDataType().isComplexType()) {
+ // partition columns are placed at the end. so, might present after complex columns

Review comment:
yes, initially also it was skipping. I will make it more easy to understand

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#discussion_r383378326

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
##########
@@ -728,7 +728,11 @@ object CommonLoadUtils {
}
val updatedRdd: RDD[InternalRow] = rdd.map { internalRow =>
for (index <- timeStampIndex) {
- internalRow.setLong(index, internalRow.getLong(index) / 1000)
+ if (internalRow.getLong(index) == 0) {
+ internalRow.setNullAt(index)
+ } else {
+ internalRow.setLong(index, internalRow.getLong(index) / 1000)

Review comment:
It is a time stamp local granularity , Let me define it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590424436

Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/442/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590451473

Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/446/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590464854

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2142/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590466281

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2146/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] ajantha-bhat commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

ajantha-bhat commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590636742

retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590642360

Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/448/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] QiangCai commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

QiangCai commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#discussion_r383615563

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
##########
@@ -728,7 +728,13 @@ object CommonLoadUtils {
}
val updatedRdd: RDD[InternalRow] = rdd.map { internalRow =>
for (index <- timeStampIndex) {
- internalRow.setLong(index, internalRow.getLong(index) / 1000)
+ if (internalRow.getLong(index) == 0) {

Review comment:
why is 0, not DIRECT_DICT_VALUE_NULL?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590661153

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2148/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#discussion_r383657139

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
##########
@@ -728,7 +728,13 @@ object CommonLoadUtils {
}
val updatedRdd: RDD[InternalRow] = rdd.map { internalRow =>
for (index <- timeStampIndex) {
- internalRow.setLong(index, internalRow.getLong(index) / 1000)
+ if (internalRow.getLong(index) == 0) {

Review comment:
because timestamp is not direct dictionary, only date is direct dictionary.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

123