QiangCai opened a new pull request #3757: URL: https://github.com/apache/carbondata/pull/3757 ### Why is this PR needed? data load jobs are missing output metrics. please check detail in jira: CARBONDATA-3812 ### What changes were proposed in this PR? 1. re-factory OutputFilesInfoHolder to DataLoadMetrics 2. add metrics: numOutputBytes and numOutputRows ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA1 commented on pull request #3757: URL: https://github.com/apache/carbondata/pull/3757#issuecomment-626102156 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1263/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3757: URL: https://github.com/apache/carbondata/pull/3757#issuecomment-626102265 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2981/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3757: URL: https://github.com/apache/carbondata/pull/3757#issuecomment-626116304 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1264/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3757: URL: https://github.com/apache/carbondata/pull/3757#issuecomment-626116407 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2982/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3757: URL: https://github.com/apache/carbondata/pull/3757#issuecomment-626134751 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2983/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3757: URL: https://github.com/apache/carbondata/pull/3757#issuecomment-626134915 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1265/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3757: URL: https://github.com/apache/carbondata/pull/3757#discussion_r422476618 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala ########## @@ -316,6 +319,7 @@ class NewDataFrameLoaderRDD[K, V]( carbonLoadModel.getTableName, carbonLoadModel.getSegment.getSegmentNo)) executor.execute(model, loader.storeLocation, recordReaders.toArray) + executor.close() Review comment: good catch. But better to add it inside taskCompletion listener. refer `UpdateDataLoad.scala` line 70 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3757: URL: https://github.com/apache/carbondata/pull/3757#discussion_r422476654 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala ########## @@ -160,6 +161,7 @@ class NewCarbonDataLoadRDD[K, V]( executor.execute(model, loader.storeLocation, recordReaders) + executor.close() Review comment: good catch. But better to add it inside taskCompletion listener. refer UpdateDataLoad.scala line 70 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3757: URL: https://github.com/apache/carbondata/pull/3757#discussion_r422476774 ########## File path: core/src/main/java/org/apache/carbondata/core/util/DataLoadMetrics.java ########## @@ -21,10 +21,10 @@ import java.util.ArrayList; import java.util.List; -public class OutputFilesInfoHolder implements Serializable { - - private static final long serialVersionUID = -1401375818456585241L; - +/** + * store data loading metrics + */ +public class DataLoadMetrics implements Serializable { Review comment: I didn't call it metrics initially because it has fileNames, partition path and all. You think metrics is more suitable ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on a change in pull request #3757: URL: https://github.com/apache/carbondata/pull/3757#discussion_r422760244 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala ########## @@ -316,6 +319,7 @@ class NewDataFrameLoaderRDD[K, V]( carbonLoadModel.getTableName, carbonLoadModel.getSegment.getSegmentNo)) executor.execute(model, loader.storeLocation, recordReaders.toArray) + executor.close() Review comment: it already added, but we need to invoke it before upload metrics. ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala ########## @@ -160,6 +161,7 @@ class NewCarbonDataLoadRDD[K, V]( executor.execute(model, loader.storeLocation, recordReaders) + executor.close() Review comment: it already added, but we need to invoke it before upload metrics. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on a change in pull request #3757: URL: https://github.com/apache/carbondata/pull/3757#discussion_r422761061 ########## File path: core/src/main/java/org/apache/carbondata/core/util/DataLoadMetrics.java ########## @@ -21,10 +21,10 @@ import java.util.ArrayList; import java.util.List; -public class OutputFilesInfoHolder implements Serializable { - - private static final long serialVersionUID = -1401375818456585241L; - +/** + * store data loading metrics + */ +public class DataLoadMetrics implements Serializable { Review comment: yes. For Hadoop framework, we collect them and put them to the task message; For Spark framework, we collect them and put them to the task metrics. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on pull request #3757: URL: https://github.com/apache/carbondata/pull/3757#issuecomment-626470360 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat removed a comment on pull request #3757: URL: https://github.com/apache/carbondata/pull/3757#issuecomment-626470360 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on pull request #3757: URL: https://github.com/apache/carbondata/pull/3757#issuecomment-626471483 ok. LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |