Karan-c980 opened a new pull request #3834: URL: https://github.com/apache/carbondata/pull/3834 ### Why is this PR needed? Currently carbondata SDK doesn't provide delete/update feature. This PR will supports carbondata SDK to delete/update of records from carbondata files ### What changes were proposed in this PR? With the help of this PR carbondata SDK will support delete/update features. For more details please refer to https://issues.apache.org/jira/browse/CARBONDATA-3865 ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-656181587 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1599/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-656182179 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3339/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-659176346 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1659/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-659176745 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3401/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-661705591 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1711/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-661706074 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3453/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-663370743 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1750/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-663371057 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3492/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-663378119 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1751/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-663378402 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3493/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
xubo245 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-663800968 Please rebase it ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
xubo245 commented on a change in pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#discussion_r460357773 ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java ########## @@ -189,16 +196,14 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO info.setBlockSize(carbonFile.getLength()); info.setVersionNumber(split.getVersion().number()); info.setUseMinMaxForPruning(false); + if(allDeleteDeltaFiles.size() != 0) { Review comment: Suggestion: CollectionUtils.isNotEmpty(allDeleteDeltaFiles) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
xubo245 commented on a change in pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#discussion_r460357847 ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java ########## @@ -260,7 +265,50 @@ public boolean accept(CarbonFile file) { getDataBlocksOfSegment(job, carbonTable, indexFilter, validSegments, new ArrayList<Segment>(), new ArrayList<String>()); numBlocks = dataBlocksOfSegment.size(); - result.addAll(dataBlocksOfSegment); - return result; + List<String> allDeleteDeltaFiles = getAllDeleteDeltaFiles(carbonTable.getTablePath()); + if(allDeleteDeltaFiles.size() > 0) { Review comment: Suggestion: CollectionUtils.isNotEmpty(allDeleteDeltaFiles) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
xubo245 commented on a change in pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#discussion_r460358199 ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java ########## @@ -260,7 +265,50 @@ public boolean accept(CarbonFile file) { getDataBlocksOfSegment(job, carbonTable, indexFilter, validSegments, new ArrayList<Segment>(), new ArrayList<String>()); numBlocks = dataBlocksOfSegment.size(); - result.addAll(dataBlocksOfSegment); - return result; + List<String> allDeleteDeltaFiles = getAllDeleteDeltaFiles(carbonTable.getTablePath()); + if(allDeleteDeltaFiles.size() > 0) { + for (CarbonInputSplit split : dataBlocksOfSegment) { + split.setDeleteDeltaFiles(getDeleteDeltaFiles(split.getFilePath(), allDeleteDeltaFiles)); + } + } + return new LinkedList<>(dataBlocksOfSegment); + } + + private List<String> getAllDeleteDeltaFiles(String path) { + List<String> deltaFiles = null; + try (Stream<Path> walk = Files.walk(Paths.get(path))) { + deltaFiles = walk.map(x -> x.toString()) + .filter(f -> f.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) + .collect(Collectors.toList()); + } catch (IOException e) { + e.printStackTrace(); Review comment: Why is e.printStackTrace()? it should be added to log ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
xubo245 commented on a change in pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#discussion_r460358760 ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java ########## @@ -260,7 +265,50 @@ public boolean accept(CarbonFile file) { getDataBlocksOfSegment(job, carbonTable, indexFilter, validSegments, new ArrayList<Segment>(), new ArrayList<String>()); numBlocks = dataBlocksOfSegment.size(); - result.addAll(dataBlocksOfSegment); - return result; + List<String> allDeleteDeltaFiles = getAllDeleteDeltaFiles(carbonTable.getTablePath()); + if(allDeleteDeltaFiles.size() > 0) { + for (CarbonInputSplit split : dataBlocksOfSegment) { + split.setDeleteDeltaFiles(getDeleteDeltaFiles(split.getFilePath(), allDeleteDeltaFiles)); + } + } + return new LinkedList<>(dataBlocksOfSegment); + } + + private List<String> getAllDeleteDeltaFiles(String path) { + List<String> deltaFiles = null; + try (Stream<Path> walk = Files.walk(Paths.get(path))) { + deltaFiles = walk.map(x -> x.toString()) + .filter(f -> f.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) + .collect(Collectors.toList()); + } catch (IOException e) { + e.printStackTrace(); + } + return deltaFiles; + } + + private String[] getDeleteDeltaFiles(String segmentPath, List<String> allDeleteDeltaFiles) { + ArrayList<String> deleteDeltaFiles = new ArrayList<>(); + String[] pathElements = segmentPath.split(CarbonCommonConstants.FILE_SEPARATOR); Review comment: How to handle it in windows? WINDOWS_FILE_SEPARATOR? Suggestion: File.separator ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
xubo245 commented on a change in pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#discussion_r460359320 ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java ########## @@ -260,7 +265,50 @@ public boolean accept(CarbonFile file) { getDataBlocksOfSegment(job, carbonTable, indexFilter, validSegments, new ArrayList<Segment>(), new ArrayList<String>()); numBlocks = dataBlocksOfSegment.size(); - result.addAll(dataBlocksOfSegment); - return result; + List<String> allDeleteDeltaFiles = getAllDeleteDeltaFiles(carbonTable.getTablePath()); + if(allDeleteDeltaFiles.size() > 0) { + for (CarbonInputSplit split : dataBlocksOfSegment) { + split.setDeleteDeltaFiles(getDeleteDeltaFiles(split.getFilePath(), allDeleteDeltaFiles)); + } + } + return new LinkedList<>(dataBlocksOfSegment); + } + + private List<String> getAllDeleteDeltaFiles(String path) { + List<String> deltaFiles = null; + try (Stream<Path> walk = Files.walk(Paths.get(path))) { + deltaFiles = walk.map(x -> x.toString()) + .filter(f -> f.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) + .collect(Collectors.toList()); + } catch (IOException e) { + e.printStackTrace(); + } + return deltaFiles; + } + + private String[] getDeleteDeltaFiles(String segmentPath, List<String> allDeleteDeltaFiles) { + ArrayList<String> deleteDeltaFiles = new ArrayList<>(); + String[] pathElements = segmentPath.split(CarbonCommonConstants.FILE_SEPARATOR); + String segmentFileName = pathElements[pathElements.length - 1]; Review comment: Please conside windows path ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
xubo245 commented on a change in pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#discussion_r460359351 ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java ########## @@ -260,7 +265,50 @@ public boolean accept(CarbonFile file) { getDataBlocksOfSegment(job, carbonTable, indexFilter, validSegments, new ArrayList<Segment>(), new ArrayList<String>()); numBlocks = dataBlocksOfSegment.size(); - result.addAll(dataBlocksOfSegment); - return result; + List<String> allDeleteDeltaFiles = getAllDeleteDeltaFiles(carbonTable.getTablePath()); + if(allDeleteDeltaFiles.size() > 0) { + for (CarbonInputSplit split : dataBlocksOfSegment) { + split.setDeleteDeltaFiles(getDeleteDeltaFiles(split.getFilePath(), allDeleteDeltaFiles)); + } + } + return new LinkedList<>(dataBlocksOfSegment); + } + + private List<String> getAllDeleteDeltaFiles(String path) { + List<String> deltaFiles = null; + try (Stream<Path> walk = Files.walk(Paths.get(path))) { + deltaFiles = walk.map(x -> x.toString()) + .filter(f -> f.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) + .collect(Collectors.toList()); + } catch (IOException e) { + e.printStackTrace(); + } + return deltaFiles; + } + + private String[] getDeleteDeltaFiles(String segmentPath, List<String> allDeleteDeltaFiles) { + ArrayList<String> deleteDeltaFiles = new ArrayList<>(); + String[] pathElements = segmentPath.split(CarbonCommonConstants.FILE_SEPARATOR); + String segmentFileName = pathElements[pathElements.length - 1]; + String ExpectedDeleteDeltaFileName = segmentFileName + .substring(segmentFileName.indexOf(CarbonCommonConstants.HYPHEN) + 1, + segmentFileName.indexOf(CarbonCommonConstants.UNDERSCORE)); + + for (String deltaFile : allDeleteDeltaFiles) { + String[] deleteDeltapathElements = + deltaFile.split(CarbonCommonConstants.FILE_SEPARATOR); Review comment: Suggestion: File.separator ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
xubo245 commented on a change in pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#discussion_r460359465 ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java ########## @@ -260,7 +265,50 @@ public boolean accept(CarbonFile file) { getDataBlocksOfSegment(job, carbonTable, indexFilter, validSegments, new ArrayList<Segment>(), new ArrayList<String>()); numBlocks = dataBlocksOfSegment.size(); - result.addAll(dataBlocksOfSegment); - return result; + List<String> allDeleteDeltaFiles = getAllDeleteDeltaFiles(carbonTable.getTablePath()); + if(allDeleteDeltaFiles.size() > 0) { + for (CarbonInputSplit split : dataBlocksOfSegment) { + split.setDeleteDeltaFiles(getDeleteDeltaFiles(split.getFilePath(), allDeleteDeltaFiles)); + } + } + return new LinkedList<>(dataBlocksOfSegment); + } + + private List<String> getAllDeleteDeltaFiles(String path) { + List<String> deltaFiles = null; + try (Stream<Path> walk = Files.walk(Paths.get(path))) { + deltaFiles = walk.map(x -> x.toString()) + .filter(f -> f.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) + .collect(Collectors.toList()); + } catch (IOException e) { + e.printStackTrace(); + } + return deltaFiles; + } + + private String[] getDeleteDeltaFiles(String segmentPath, List<String> allDeleteDeltaFiles) { + ArrayList<String> deleteDeltaFiles = new ArrayList<>(); + String[] pathElements = segmentPath.split(CarbonCommonConstants.FILE_SEPARATOR); + String segmentFileName = pathElements[pathElements.length - 1]; + String ExpectedDeleteDeltaFileName = segmentFileName + .substring(segmentFileName.indexOf(CarbonCommonConstants.HYPHEN) + 1, + segmentFileName.indexOf(CarbonCommonConstants.UNDERSCORE)); + + for (String deltaFile : allDeleteDeltaFiles) { + String[] deleteDeltapathElements = + deltaFile.split(CarbonCommonConstants.FILE_SEPARATOR); + String deleteDeltaFullFileName = deleteDeltapathElements[deleteDeltapathElements.length - 1]; + String deleteDeltaFileName = deleteDeltaFullFileName + .substring(0, deleteDeltaFullFileName.indexOf(CarbonCommonConstants.UNDERSCORE)); Review comment: Whether is -1 for deleteDeltaFullFileName.indexOf(CarbonCommonConstants.UNDERSCORE)? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
xubo245 commented on a change in pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#discussion_r460375833 ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java ########## @@ -559,4 +564,50 @@ public void close(TaskAttemptContext taskAttemptContext) throws InterruptedExcep super.close(taskAttemptContext); } } + + public static RecordWriter<NullWritable, ObjectArrayWritable> getDeleteDeltaRecordWriter(String path) { + return (new RecordWriter<NullWritable, ObjectArrayWritable>() { + private final ArrayList<String> tupleId = new ArrayList<>(); + + @Override + public void write(NullWritable aVoid, ObjectArrayWritable objects) { + this.tupleId.add((String) objects.get()[0]); + } + + @Override + public void close(TaskAttemptContext taskAttemptContext) throws IOException { + Map<String, DeleteDeltaBlockDetails> blockToDeleteDeltaBlockMapping = new HashMap<>(); + DeleteDeltaBlockDetails blockDetails; + String blockName; + for (String tuple : tupleId) { + blockName = CarbonUpdateUtil.getBlockName( + (tuple.split(CarbonCommonConstants.FILE_SEPARATOR)[TupleIdEnum.BLOCK_ID Review comment: Suggestion: File.separator ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |