[GitHub] [carbondata] ajantha-bhat opened a new pull request #3615: [WIP] Send insert stage, compaction to new insert into flow

classic Classic list List threaded Threaded
46 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
ajantha-bhat commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-589498137
 
 
   @QiangCai , @kunal642 , @jackylk :PR is ready. please review

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-589501574
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/378/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-589519466
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2080/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590150487
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590153543
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/414/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590163981
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2115/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#discussion_r383369387
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoCommand.scala
 ##########
 @@ -176,8 +176,21 @@ case class CarbonInsertIntoCommand(databaseNameOp: Option[String],
       convertedStaticPartition)
     scanResultRdd = sparkSession.sessionState.executePlan(newLogicalPlan).toRdd
     if (logicalPartitionRelation != null) {
-      logicalPartitionRelation =
-        getReArrangedSchemaLogicalRelation(reArrangedIndex, logicalPartitionRelation)
+      if (selectedColumnSchema.length != logicalPartitionRelation.output.length) {
+        throw new RuntimeException(" schema length doesn't match partition length")
+      }
+      var isAlreadyReArranged = true
+      var index = 0
+      for (col: ColumnSchema <- selectedColumnSchema) {
 
 Review comment:
   Please use lambda function instead of `for` loop, findFirst?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#discussion_r383369776
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
 ##########
 @@ -728,7 +728,11 @@ object CommonLoadUtils {
     }
     val updatedRdd: RDD[InternalRow] = rdd.map { internalRow =>
       for (index <- timeStampIndex) {
-        internalRow.setLong(index, internalRow.getLong(index) / 1000)
+        if (internalRow.getLong(index) == 0) {
+          internalRow.setNullAt(index)
+        } else {
+          internalRow.setLong(index, internalRow.getLong(index) / 1000)
 
 Review comment:
   What does 1000 stands for? It is magic number

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#discussion_r383371226
 
 

 ##########
 File path: processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java
 ##########
 @@ -99,9 +99,12 @@
     noDictDimensionPages = new ColumnPage[model.getNoDictionaryCount()];
     int tmpNumDictDimIdx = 0;
     int tmpNumNoDictDimIdx = 0;
-    for (int i = 0; i < dictDimensionPages.length + noDictDimensionPages.length; i++) {
+    for (int i = 0; i < tableSpec.getNumDimensions(); i++) {
       TableSpec.DimensionSpec spec = tableSpec.getDimensionSpec(i);
-      ColumnType columnType = tableSpec.getDimensionSpec(i).getColumnType();
+      if (spec.getSchemaDataType().isComplexType()) {
+        // partition columns are placed at the end. so, might present after complex columns
 
 Review comment:
   Do you mean to skip all complex column and go to the last dimension?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#discussion_r383377457
 
 

 ##########
 File path: processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java
 ##########
 @@ -99,9 +99,12 @@
     noDictDimensionPages = new ColumnPage[model.getNoDictionaryCount()];
     int tmpNumDictDimIdx = 0;
     int tmpNumNoDictDimIdx = 0;
-    for (int i = 0; i < dictDimensionPages.length + noDictDimensionPages.length; i++) {
+    for (int i = 0; i < tableSpec.getNumDimensions(); i++) {
       TableSpec.DimensionSpec spec = tableSpec.getDimensionSpec(i);
-      ColumnType columnType = tableSpec.getDimensionSpec(i).getColumnType();
+      if (spec.getSchemaDataType().isComplexType()) {
+        // partition columns are placed at the end. so, might present after complex columns
 
 Review comment:
   yes, initially also it was skipping. I will make it more easy to understand

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#discussion_r383378326
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
 ##########
 @@ -728,7 +728,11 @@ object CommonLoadUtils {
     }
     val updatedRdd: RDD[InternalRow] = rdd.map { internalRow =>
       for (index <- timeStampIndex) {
-        internalRow.setLong(index, internalRow.getLong(index) / 1000)
+        if (internalRow.getLong(index) == 0) {
+          internalRow.setNullAt(index)
+        } else {
+          internalRow.setLong(index, internalRow.getLong(index) / 1000)
 
 Review comment:
   It is a time stamp local granularity , Let me define it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590424436
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/442/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590451473
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/446/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590464854
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2142/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590466281
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2146/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590636742
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590642360
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/448/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
QiangCai commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#discussion_r383615563
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
 ##########
 @@ -728,7 +728,13 @@ object CommonLoadUtils {
     }
     val updatedRdd: RDD[InternalRow] = rdd.map { internalRow =>
       for (index <- timeStampIndex) {
-        internalRow.setLong(index, internalRow.getLong(index) / 1000)
+        if (internalRow.getLong(index) == 0) {
 
 Review comment:
   why is 0, not DIRECT_DICT_VALUE_NULL?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-590661153
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2148/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#discussion_r383657139
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
 ##########
 @@ -728,7 +728,13 @@ object CommonLoadUtils {
     }
     val updatedRdd: RDD[InternalRow] = rdd.map { internalRow =>
       for (index <- timeStampIndex) {
-        internalRow.setLong(index, internalRow.getLong(index) / 1000)
+        if (internalRow.getLong(index) == 0) {
 
 Review comment:
   because timestamp is not direct dictionary, only date is direct dictionary.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
123