akashrn5 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569683859 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569693908 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1362/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569713060 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1383/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569723184 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1372/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-570919818 please rebase ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-570919888 please rebase and change for CarbonInsertFromStageCommand to also ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk edited a comment on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-570919888 Please rebase and change for CarbonInsertFromStageCommand to also. Thanks ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk removed a comment on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-570919818 please rebase ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akkio-97 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-571051167 > Please rebase and change for CarbonInsertFromStageCommand to also. > Thanks Done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-571080417 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1490/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akkio-97 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-571935136 @jackylk please review and merge. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#discussion_r364135260 ########## File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ########## @@ -443,23 +438,18 @@ object DataLoadProcessBuilderOnSpark { .asScala .map(_.getColName) .toArray + + /** + * [[org.apache.spark.sql.catalyst.expressions.objects.ValidateExternalType]] validates the + * datatype of column data and corresponding datatype in schema provided to create dataframe. + * Since carbonScanRDD gives Long data for timestamp column and corresponding column datatype in + * schema is Timestamp, this validation fails if we use createDataFrame API which takes rdd as + * input. Hence, We need to give the List[Row] compatible with the schema datatypes. So using + * the createDataFrame API which takes List[Row] and schema as input. + */ val schema = SparkTypeConverter.createSparkSchema(carbonTable, columns) - val rdd: RDD[InternalRow] = new CarbonScanRDD[CarbonRow]( - sparkSession, - columnProjection = new CarbonProjection(columns), - null, - carbonTable.getAbsoluteTableIdentifier, - carbonTable.getTableInfo.serialize, - carbonTable.getTableInfo, - new CarbonInputMetrics, - null, - classOf[SparkDataTypeConverterImpl], - classOf[CarbonRowReadSupport], - splits.asJava) - .map { row => - new GenericInternalRow(row.getData.asInstanceOf[Array[Any]]) - } - SparkSQLUtil.execute(rdd, schema, sparkSession) + val listOfRows = sparkSession.sqlContext.table(carbonTable.getTableName).collect().toList.asJava Review comment: If use List[Row] to instead of RDD, i think it need use more memory to cache all rows, all right? May increase OOM risk. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#discussion_r364137512 ########## File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ########## @@ -443,23 +438,18 @@ object DataLoadProcessBuilderOnSpark { .asScala .map(_.getColName) .toArray + + /** + * [[org.apache.spark.sql.catalyst.expressions.objects.ValidateExternalType]] validates the + * datatype of column data and corresponding datatype in schema provided to create dataframe. + * Since carbonScanRDD gives Long data for timestamp column and corresponding column datatype in + * schema is Timestamp, this validation fails if we use createDataFrame API which takes rdd as + * input. Hence, We need to give the List[Row] compatible with the schema datatypes. So using + * the createDataFrame API which takes List[Row] and schema as input. + */ val schema = SparkTypeConverter.createSparkSchema(carbonTable, columns) - val rdd: RDD[InternalRow] = new CarbonScanRDD[CarbonRow]( - sparkSession, - columnProjection = new CarbonProjection(columns), - null, - carbonTable.getAbsoluteTableIdentifier, - carbonTable.getTableInfo.serialize, - carbonTable.getTableInfo, - new CarbonInputMetrics, - null, - classOf[SparkDataTypeConverterImpl], - classOf[CarbonRowReadSupport], - splits.asJava) - .map { row => - new GenericInternalRow(row.getData.asInstanceOf[Array[Any]]) - } - SparkSQLUtil.execute(rdd, schema, sparkSession) + val listOfRows = sparkSession.sqlContext.table(carbonTable.getTableName).collect().toList.asJava Review comment: carbonTable is the target table of load process, not the source table. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-572583789 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1573/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#discussion_r364769297 ########## File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ########## @@ -443,23 +438,18 @@ object DataLoadProcessBuilderOnSpark { .asScala .map(_.getColName) .toArray + + /** + * [[org.apache.spark.sql.catalyst.expressions.objects.ValidateExternalType]] validates the + * datatype of column data and corresponding datatype in schema provided to create dataframe. + * Since carbonScanRDD gives Long data for timestamp column and corresponding column datatype in + * schema is Timestamp, this validation fails if we use createDataFrame API which takes rdd as + * input. Hence, We need to give the List[Row] compatible with the schema datatypes. So using + * the createDataFrame API which takes List[Row] and schema as input. + */ val schema = SparkTypeConverter.createSparkSchema(carbonTable, columns) - val rdd: RDD[InternalRow] = new CarbonScanRDD[CarbonRow]( - sparkSession, - columnProjection = new CarbonProjection(columns), - null, - carbonTable.getAbsoluteTableIdentifier, - carbonTable.getTableInfo.serialize, - carbonTable.getTableInfo, - new CarbonInputMetrics, - null, - classOf[SparkDataTypeConverterImpl], - classOf[CarbonRowReadSupport], - splits.asJava) - .map { row => - new GenericInternalRow(row.getData.asInstanceOf[Array[Any]]) - } - SparkSQLUtil.execute(rdd, schema, sparkSession) + val listOfRows = sparkSession.sqlContext.table(carbonTable.getTableName).collect().toList.asJava Review comment: This method will be called by compaction and insert into stage command. So the carbonTable ought to be the source table. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-572597036 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1574/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-572637232 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1575/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-573310579 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1599/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-573323098 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1600/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-573512950 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1614/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |