Apache CarbonData Dev Mailing List archive - [GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

Posted by GitBox on Nov 19, 2020; 7:49am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GitHub-carbondata-Karan980-opened-a-new-pull-request-4009-CARBONDATA-4029-Fix-Old-Timestamp-issue-int-tp103264p103283.html

ajantha-bhat commented on a change in pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#discussion_r526651940

##########
File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##########
@@ -572,4 +574,18 @@ public ReadCommittedScope getReadCommitted(JobContext job, AbsoluteTableIdentifi
public void setReadCommittedScope(ReadCommittedScope readCommittedScope) {
this.readCommittedScope = readCommittedScope;
}
+
+ public String getSegmentIdFromFilePath(String filePath) {

Review comment:
Already `CarbonTablePath#getSegmentIdFromPath` method exist. check if you can use it.

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##########
@@ -781,6 +781,26 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll {
sql(s"drop table $tableName")
}

+ test("Test add segment by carbon written by sdk having old timestamp") {
+ sql(s"drop table if exists external_primitive")
+ sql(
+ s"""
+ |create table external_primitive (id int, name string, rank smallint, salary double,
+ | active boolean, dob date, doj timestamp, city string, dept string) stored as carbondata
+ |""".stripMargin)
+ val externalSegmentPathWithOldTimestamp = storeLocation + "/" +
+ "external_segment_with_old_timestamp"
+ val externalSegmentPath = storeLocation + "/" + "external_segment"
+ FileFactory.deleteAllFilesOfDir(new File(externalSegmentPath))
+ copy(externalSegmentPathWithOldTimestamp, externalSegmentPath)

Review comment:
We should not keep binary files in the repo (like carbondata and index files). It has to generate on the fly.
Try to write from SDK and use nano time timeStamp by calling `CarbonWriterBuilder#uniqueIdentifier`

##########
File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##########
@@ -367,8 +368,9 @@ protected FileSplit makeSplit(String segmentId, String filePath, long start, lon
String[] deleteDeltaFilePath = null;
if (isIUDTable) {
// In case IUD is not performed in this table avoid searching for
- // invalidated blocks.
- if (CarbonUtil
+ // invalidated blocks. No need to check validation for splits written by SDK.
+ String segmentId = getSegmentIdFromFilePath(inputSplit.getFilePath());

Review comment:
add more comments explaining SDK case segment id will always be null and using that as a decision point to skip this validation for SDK and why you need to skip it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]