[GitHub] [carbondata] akkio-97 opened a new pull request #3515: resolved error in timestamp during compaction

classic Classic list List threaded Threaded
47 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
akashrn5 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569683859
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569693908
 
 
   Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1362/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569713060
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1383/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569723184
 
 
   Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1372/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
jackylk commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-570919818
 
 
   please rebase

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
jackylk commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-570919888
 
 
   please rebase
   and change for CarbonInsertFromStageCommand to also

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk edited a comment on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
jackylk edited a comment on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-570919888
 
 
   Please rebase and change for CarbonInsertFromStageCommand to also.
   Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk removed a comment on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
jackylk removed a comment on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-570919818
 
 
   please rebase

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akkio-97 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
akkio-97 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-571051167
 
 
   > Please rebase and change for CarbonInsertFromStageCommand to also.
   > Thanks
   
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-571080417
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1490/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akkio-97 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
akkio-97 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-571935136
 
 
   @jackylk please review and merge.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] niuge01 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
niuge01 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#discussion_r364135260
 
 

 ##########
 File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala
 ##########
 @@ -443,23 +438,18 @@ object DataLoadProcessBuilderOnSpark {
       .asScala
       .map(_.getColName)
       .toArray
+
+    /**
+     * [[org.apache.spark.sql.catalyst.expressions.objects.ValidateExternalType]] validates the
+     * datatype of column data and corresponding datatype in schema provided to create dataframe.
+     * Since carbonScanRDD gives Long data for timestamp column and corresponding column datatype in
+     * schema is Timestamp, this validation fails if we use createDataFrame API which takes rdd as
+     * input. Hence, We need to give the List[Row] compatible with the schema datatypes. So using
+     * the createDataFrame API which takes List[Row] and schema as input.
+     */
     val schema = SparkTypeConverter.createSparkSchema(carbonTable, columns)
-    val rdd: RDD[InternalRow] = new CarbonScanRDD[CarbonRow](
-      sparkSession,
-      columnProjection = new CarbonProjection(columns),
-      null,
-      carbonTable.getAbsoluteTableIdentifier,
-      carbonTable.getTableInfo.serialize,
-      carbonTable.getTableInfo,
-      new CarbonInputMetrics,
-      null,
-      classOf[SparkDataTypeConverterImpl],
-      classOf[CarbonRowReadSupport],
-      splits.asJava)
-      .map { row =>
-        new GenericInternalRow(row.getData.asInstanceOf[Array[Any]])
-      }
-    SparkSQLUtil.execute(rdd, schema, sparkSession)
+    val listOfRows = sparkSession.sqlContext.table(carbonTable.getTableName).collect().toList.asJava
 
 Review comment:
   If use List[Row] to instead of RDD, i think it need use more memory to cache all rows, all right?
   May increase OOM risk.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] niuge01 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
niuge01 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#discussion_r364137512
 
 

 ##########
 File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala
 ##########
 @@ -443,23 +438,18 @@ object DataLoadProcessBuilderOnSpark {
       .asScala
       .map(_.getColName)
       .toArray
+
+    /**
+     * [[org.apache.spark.sql.catalyst.expressions.objects.ValidateExternalType]] validates the
+     * datatype of column data and corresponding datatype in schema provided to create dataframe.
+     * Since carbonScanRDD gives Long data for timestamp column and corresponding column datatype in
+     * schema is Timestamp, this validation fails if we use createDataFrame API which takes rdd as
+     * input. Hence, We need to give the List[Row] compatible with the schema datatypes. So using
+     * the createDataFrame API which takes List[Row] and schema as input.
+     */
     val schema = SparkTypeConverter.createSparkSchema(carbonTable, columns)
-    val rdd: RDD[InternalRow] = new CarbonScanRDD[CarbonRow](
-      sparkSession,
-      columnProjection = new CarbonProjection(columns),
-      null,
-      carbonTable.getAbsoluteTableIdentifier,
-      carbonTable.getTableInfo.serialize,
-      carbonTable.getTableInfo,
-      new CarbonInputMetrics,
-      null,
-      classOf[SparkDataTypeConverterImpl],
-      classOf[CarbonRowReadSupport],
-      splits.asJava)
-      .map { row =>
-        new GenericInternalRow(row.getData.asInstanceOf[Array[Any]])
-      }
-    SparkSQLUtil.execute(rdd, schema, sparkSession)
+    val listOfRows = sparkSession.sqlContext.table(carbonTable.getTableName).collect().toList.asJava
 
 Review comment:
   carbonTable is the target table of load process, not the source table.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-572583789
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1573/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#discussion_r364769297
 
 

 ##########
 File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala
 ##########
 @@ -443,23 +438,18 @@ object DataLoadProcessBuilderOnSpark {
       .asScala
       .map(_.getColName)
       .toArray
+
+    /**
+     * [[org.apache.spark.sql.catalyst.expressions.objects.ValidateExternalType]] validates the
+     * datatype of column data and corresponding datatype in schema provided to create dataframe.
+     * Since carbonScanRDD gives Long data for timestamp column and corresponding column datatype in
+     * schema is Timestamp, this validation fails if we use createDataFrame API which takes rdd as
+     * input. Hence, We need to give the List[Row] compatible with the schema datatypes. So using
+     * the createDataFrame API which takes List[Row] and schema as input.
+     */
     val schema = SparkTypeConverter.createSparkSchema(carbonTable, columns)
-    val rdd: RDD[InternalRow] = new CarbonScanRDD[CarbonRow](
-      sparkSession,
-      columnProjection = new CarbonProjection(columns),
-      null,
-      carbonTable.getAbsoluteTableIdentifier,
-      carbonTable.getTableInfo.serialize,
-      carbonTable.getTableInfo,
-      new CarbonInputMetrics,
-      null,
-      classOf[SparkDataTypeConverterImpl],
-      classOf[CarbonRowReadSupport],
-      splits.asJava)
-      .map { row =>
-        new GenericInternalRow(row.getData.asInstanceOf[Array[Any]])
-      }
-    SparkSQLUtil.execute(rdd, schema, sparkSession)
+    val listOfRows = sparkSession.sqlContext.table(carbonTable.getTableName).collect().toList.asJava
 
 Review comment:
   This method will be called by compaction and insert into stage command. So the carbonTable ought to be the source table.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-572597036
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1574/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-572637232
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1575/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-573310579
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1599/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-573323098
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1600/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-573512950
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1614/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
123