nihal0107 opened a new pull request #4068: URL: https://github.com/apache/carbondata/pull/4068 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA2 commented on pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#issuecomment-752378743 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3504/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#issuecomment-752378836 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5265/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#issuecomment-752409665 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5266/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#issuecomment-752411409 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3505/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#discussion_r550181707 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java ########## @@ -427,6 +435,26 @@ private CarbonFileInputFormat prepareFileInputFormat(Job job, boolean enableBloc } } + private <T> void totalRowCountInSplits(Job job, List<InputSplit> splits, + List<Long> rowCountInSplit) throws IOException, InterruptedException { + long sum = 0; + for (InputSplit split : splits) { + List<RecordReader<Void, T>> readers = new ArrayList<>(); Review comment: please try to check if the filter is present, then prepare a reader and get the split count like you did now. Else handle it like previous (by just reading total rows in a split), because reading rows again is kind of slow. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#discussion_r550182177 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/PaginationCarbonReader.java ########## @@ -174,14 +170,15 @@ private Range getBlockletIndexRange(long fromRowNumber, long toRowNumber) { BlockletDetailInfo detailInfo = ((CarbonInputSplit) allBlockletSplits.get(i)).getDetailInfo(); int rowCountInBlocklet = detailInfo.getRowCount(); - Object[] rowsInBlocklet = new Object[rowCountInBlocklet]; + // Object[] rowsInBlocklet = new Object[rowCountInBlocklet]; + List<Object> rowsInBlocklet = new ArrayList<>(); Review comment: a. why not use array itself ? b. Don't keep commented code c. int count = 0; This is unused now ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#discussion_r550184347 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/PaginationCarbonReader.java ########## @@ -174,14 +170,15 @@ private Range getBlockletIndexRange(long fromRowNumber, long toRowNumber) { BlockletDetailInfo detailInfo = ((CarbonInputSplit) allBlockletSplits.get(i)).getDetailInfo(); int rowCountInBlocklet = detailInfo.getRowCount(); - Object[] rowsInBlocklet = new Object[rowCountInBlocklet]; + // Object[] rowsInBlocklet = new Object[rowCountInBlocklet]; + List<Object> rowsInBlocklet = new ArrayList<>(); Review comment: also check rowCountInBlocklet ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#discussion_r550185543 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java ########## @@ -427,6 +435,26 @@ private CarbonFileInputFormat prepareFileInputFormat(Job job, boolean enableBloc } } + private <T> void totalRowCountInSplits(Job job, List<InputSplit> splits, + List<Long> rowCountInSplit) throws IOException, InterruptedException { + long sum = 0; + for (InputSplit split : splits) { + List<RecordReader<Void, T>> readers = new ArrayList<>(); Review comment: If store has delta files || filter is present --> use new way by building reader and get the count. Else do old way. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
nihal0107 commented on a change in pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#discussion_r551136576 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java ########## @@ -427,6 +435,26 @@ private CarbonFileInputFormat prepareFileInputFormat(Job job, boolean enableBloc } } + private <T> void totalRowCountInSplits(Job job, List<InputSplit> splits, + List<Long> rowCountInSplit) throws IOException, InterruptedException { + long sum = 0; + for (InputSplit split : splits) { + List<RecordReader<Void, T>> readers = new ArrayList<>(); Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
nihal0107 commented on a change in pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#discussion_r551136803 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/PaginationCarbonReader.java ########## @@ -174,14 +170,15 @@ private Range getBlockletIndexRange(long fromRowNumber, long toRowNumber) { BlockletDetailInfo detailInfo = ((CarbonInputSplit) allBlockletSplits.get(i)).getDetailInfo(); int rowCountInBlocklet = detailInfo.getRowCount(); - Object[] rowsInBlocklet = new Object[rowCountInBlocklet]; + // Object[] rowsInBlocklet = new Object[rowCountInBlocklet]; + List<Object> rowsInBlocklet = new ArrayList<>(); Review comment: Using array is causing the problem in case of update or delete operation. Removed unused code. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#issuecomment-753819225 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5271/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#issuecomment-753821897 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3510/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#discussion_r551238408 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java ########## @@ -427,6 +436,50 @@ private CarbonFileInputFormat prepareFileInputFormat(Job job, boolean enableBloc } } + private <T> void totalRowCountInSplits(Job job, List<InputSplit> splits, + List<Long> rowCountInSplit) throws IOException, InterruptedException { + long sum = 0; + boolean isIUDTable = false; + // Check if update or delete happened on the table. + if (!StringUtils.isEmpty(this.tablePath)) { + CarbonFile[] fileList = FileFactory.getCarbonFile(this.tablePath, + this.hadoopConf).listFiles(); + for (CarbonFile file : fileList) { Review comment: If you have too many files in table path, checking each file is delta file or not will take time, please manually check SDK update testcase and see if we write any metadata (like table status or update status) to know update has happened or not. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#discussion_r551240783 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java ########## @@ -427,6 +436,50 @@ private CarbonFileInputFormat prepareFileInputFormat(Job job, boolean enableBloc } } + private <T> void totalRowCountInSplits(Job job, List<InputSplit> splits, + List<Long> rowCountInSplit) throws IOException, InterruptedException { + long sum = 0; + boolean isIUDTable = false; + // Check if update or delete happened on the table. + if (!StringUtils.isEmpty(this.tablePath)) { + CarbonFile[] fileList = FileFactory.getCarbonFile(this.tablePath, + this.hadoopConf).listFiles(); + for (CarbonFile file : fileList) { + if (file.getPath().endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) { + isIUDTable = true; + break; + } + } + } + // if filter exists or IUD happened then read the total number of rows after + // building carbon reader else get the row count from the details info of each splits. + if (this.filterExpression != null || isIUDTable) { + for (InputSplit split : splits) { + List<RecordReader<Void, T>> readers = new ArrayList<>(); + CarbonFileInputFormat format = this.prepareFileInputFormat(job, false, true); + RecordReader reader = this.getRecordReader(job, format, readers, split); + readers.add(reader); + CarbonReader carbonReader = new CarbonReader<>(readers); Review comment: please close the reader in the finallay block, else there will be resource leak ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#discussion_r551240783 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java ########## @@ -427,6 +436,50 @@ private CarbonFileInputFormat prepareFileInputFormat(Job job, boolean enableBloc } } + private <T> void totalRowCountInSplits(Job job, List<InputSplit> splits, + List<Long> rowCountInSplit) throws IOException, InterruptedException { + long sum = 0; + boolean isIUDTable = false; + // Check if update or delete happened on the table. + if (!StringUtils.isEmpty(this.tablePath)) { + CarbonFile[] fileList = FileFactory.getCarbonFile(this.tablePath, + this.hadoopConf).listFiles(); + for (CarbonFile file : fileList) { + if (file.getPath().endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) { + isIUDTable = true; + break; + } + } + } + // if filter exists or IUD happened then read the total number of rows after + // building carbon reader else get the row count from the details info of each splits. + if (this.filterExpression != null || isIUDTable) { + for (InputSplit split : splits) { + List<RecordReader<Void, T>> readers = new ArrayList<>(); + CarbonFileInputFormat format = this.prepareFileInputFormat(job, false, true); + RecordReader reader = this.getRecordReader(job, format, readers, split); + readers.add(reader); + CarbonReader carbonReader = new CarbonReader<>(readers); Review comment: please close the reader in the finally block, else there will be a resource leak ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#discussion_r551241629 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java ########## @@ -427,6 +436,50 @@ private CarbonFileInputFormat prepareFileInputFormat(Job job, boolean enableBloc } } + private <T> void totalRowCountInSplits(Job job, List<InputSplit> splits, + List<Long> rowCountInSplit) throws IOException, InterruptedException { + long sum = 0; + boolean isIUDTable = false; + // Check if update or delete happened on the table. + if (!StringUtils.isEmpty(this.tablePath)) { + CarbonFile[] fileList = FileFactory.getCarbonFile(this.tablePath, + this.hadoopConf).listFiles(); + for (CarbonFile file : fileList) { + if (file.getPath().endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) { + isIUDTable = true; + break; + } + } + } + // if filter exists or IUD happened then read the total number of rows after + // building carbon reader else get the row count from the details info of each splits. + if (this.filterExpression != null || isIUDTable) { + for (InputSplit split : splits) { + List<RecordReader<Void, T>> readers = new ArrayList<>(); + CarbonFileInputFormat format = this.prepareFileInputFormat(job, false, true); + RecordReader reader = this.getRecordReader(job, format, readers, split); + readers.add(reader); + CarbonReader carbonReader = new CarbonReader<>(readers); Review comment: I mean `reader.close()` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
nihal0107 commented on a change in pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#discussion_r551340636 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java ########## @@ -427,6 +436,50 @@ private CarbonFileInputFormat prepareFileInputFormat(Job job, boolean enableBloc } } + private <T> void totalRowCountInSplits(Job job, List<InputSplit> splits, + List<Long> rowCountInSplit) throws IOException, InterruptedException { + long sum = 0; + boolean isIUDTable = false; + // Check if update or delete happened on the table. + if (!StringUtils.isEmpty(this.tablePath)) { + CarbonFile[] fileList = FileFactory.getCarbonFile(this.tablePath, + this.hadoopConf).listFiles(); + for (CarbonFile file : fileList) { Review comment: Ok, Added an empty directory as a flag to check if update/delete happened for the non-transactional carbon table. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
nihal0107 commented on a change in pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#discussion_r551340796 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java ########## @@ -427,6 +436,50 @@ private CarbonFileInputFormat prepareFileInputFormat(Job job, boolean enableBloc } } + private <T> void totalRowCountInSplits(Job job, List<InputSplit> splits, + List<Long> rowCountInSplit) throws IOException, InterruptedException { + long sum = 0; + boolean isIUDTable = false; + // Check if update or delete happened on the table. + if (!StringUtils.isEmpty(this.tablePath)) { + CarbonFile[] fileList = FileFactory.getCarbonFile(this.tablePath, + this.hadoopConf).listFiles(); + for (CarbonFile file : fileList) { + if (file.getPath().endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) { + isIUDTable = true; + break; + } + } + } + // if filter exists or IUD happened then read the total number of rows after + // building carbon reader else get the row count from the details info of each splits. + if (this.filterExpression != null || isIUDTable) { + for (InputSplit split : splits) { + List<RecordReader<Void, T>> readers = new ArrayList<>(); + CarbonFileInputFormat format = this.prepareFileInputFormat(job, false, true); + RecordReader reader = this.getRecordReader(job, format, readers, split); + readers.add(reader); + CarbonReader carbonReader = new CarbonReader<>(readers); Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4068: URL: https://github.com/apache/carbondata/pull/4068#issuecomment-754069262 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5280/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |