[GitHub] [carbondata] Karan980 opened a new pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 opened a new pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox

Karan980 opened a new pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009


    ### Why is this PR needed?
   Earlier timestamp present in name of carbondata files was in nanoseconds. Currently the timestamp is in milliseconds. When old SDK file segment is added to table through alter table add segment query then it is treated as invalid block due to timestamp present in nanoseconds.
   
    ### What changes were proposed in this PR?
   Removed update validation for SDK written files.
   
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox

ajantha-bhat commented on a change in pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#discussion_r526651940



##########
File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##########
@@ -572,4 +574,18 @@ public ReadCommittedScope getReadCommitted(JobContext job, AbsoluteTableIdentifi
   public void setReadCommittedScope(ReadCommittedScope readCommittedScope) {
     this.readCommittedScope = readCommittedScope;
   }
+
+  public String getSegmentIdFromFilePath(String filePath) {

Review comment:
       Already `CarbonTablePath#getSegmentIdFromPath` method exist. check if you can use it.

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##########
@@ -781,6 +781,26 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll {
     sql(s"drop table $tableName")
   }
 
+  test("Test add segment by carbon written by sdk having old timestamp") {
+    sql(s"drop table if exists external_primitive")
+    sql(
+      s"""
+         |create table external_primitive (id int, name string, rank smallint, salary double,
+         | active boolean, dob date, doj timestamp, city string, dept string) stored as carbondata
+         |""".stripMargin)
+    val externalSegmentPathWithOldTimestamp = storeLocation + "/" +
+      "external_segment_with_old_timestamp"
+    val externalSegmentPath = storeLocation + "/" + "external_segment"
+    FileFactory.deleteAllFilesOfDir(new File(externalSegmentPath))
+    copy(externalSegmentPathWithOldTimestamp, externalSegmentPath)

Review comment:
       We should not keep binary files in the repo (like carbondata and index files). It has to generate on the fly.
   Try to write from SDK and use nano time timeStamp by calling `CarbonWriterBuilder#uniqueIdentifier`

##########
File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##########
@@ -367,8 +368,9 @@ protected FileSplit makeSplit(String segmentId, String filePath, long start, lon
       String[] deleteDeltaFilePath = null;
       if (isIUDTable) {
         // In case IUD is not performed in this table avoid searching for
-        // invalidated blocks.
-        if (CarbonUtil
+        // invalidated blocks. No need to check validation for splits written by SDK.
+        String segmentId = getSegmentIdFromFilePath(inputSplit.getFilePath());

Review comment:
       add more comments explaining SDK case segment id will always be null and using that as a decision point to skip this validation for SDK and why you need to skip it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on a change in pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

Karan980 commented on a change in pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#discussion_r528482794



##########
File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##########
@@ -572,4 +574,18 @@ public ReadCommittedScope getReadCommitted(JobContext job, AbsoluteTableIdentifi
   public void setReadCommittedScope(ReadCommittedScope readCommittedScope) {
     this.readCommittedScope = readCommittedScope;
   }
+
+  public String getSegmentIdFromFilePath(String filePath) {

Review comment:
       _CarbonTablePath#getSegmentIdFromPath_ returns segmentId present in path. For eg: Fact/Part0/segment_**0**. But in this case we need segment id present in carbondata file name. So above method cannot be used.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#discussion_r528485356



##########
File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##########
@@ -572,4 +574,18 @@ public ReadCommittedScope getReadCommitted(JobContext job, AbsoluteTableIdentifi
   public void setReadCommittedScope(ReadCommittedScope readCommittedScope) {
     this.readCommittedScope = readCommittedScope;
   }
+
+  public String getSegmentIdFromFilePath(String filePath) {

Review comment:
       The caller of this method, inputSplit itself has getSegmentId() method. I think you can use it instead of this. Please avoid writing new code and try to reuse the code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on a change in pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

Karan980 commented on a change in pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#discussion_r528556523



##########
File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##########
@@ -367,8 +368,9 @@ protected FileSplit makeSplit(String segmentId, String filePath, long start, lon
       String[] deleteDeltaFilePath = null;
       if (isIUDTable) {
         // In case IUD is not performed in this table avoid searching for
-        // invalidated blocks.
-        if (CarbonUtil
+        // invalidated blocks. No need to check validation for splits written by SDK.
+        String segmentId = getSegmentIdFromFilePath(inputSplit.getFilePath());

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on a change in pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

Karan980 commented on a change in pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#discussion_r528558123



##########
File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##########
@@ -572,4 +574,18 @@ public ReadCommittedScope getReadCommitted(JobContext job, AbsoluteTableIdentifi
   public void setReadCommittedScope(ReadCommittedScope readCommittedScope) {
     this.readCommittedScope = readCommittedScope;
   }
+
+  public String getSegmentIdFromFilePath(String filePath) {

Review comment:
       getSegmentId() also return segmentId from LoadMetaDataDetails of segment which is not null. Only null for segmentId is present in name of carbondata file of SDK segment and there is no already existing method to get that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on a change in pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

Karan980 commented on a change in pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#discussion_r528558754



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##########
@@ -781,6 +781,26 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll {
     sql(s"drop table $tableName")
   }
 
+  test("Test add segment by carbon written by sdk having old timestamp") {
+    sql(s"drop table if exists external_primitive")
+    sql(
+      s"""
+         |create table external_primitive (id int, name string, rank smallint, salary double,
+         | active boolean, dob date, doj timestamp, city string, dept string) stored as carbondata
+         |""".stripMargin)
+    val externalSegmentPathWithOldTimestamp = storeLocation + "/" +
+      "external_segment_with_old_timestamp"
+    val externalSegmentPath = storeLocation + "/" + "external_segment"
+    FileFactory.deleteAllFilesOfDir(new File(externalSegmentPath))
+    copy(externalSegmentPathWithOldTimestamp, externalSegmentPath)

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732084303


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4859/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732088276


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3106/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#discussion_r528870553



##########
File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##########
@@ -572,4 +574,18 @@ public ReadCommittedScope getReadCommitted(JobContext job, AbsoluteTableIdentifi
   public void setReadCommittedScope(ReadCommittedScope readCommittedScope) {
     this.readCommittedScope = readCommittedScope;
   }
+
+  public String getSegmentIdFromFilePath(String filePath) {

Review comment:
       ok




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732304718


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732363870


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4867/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732364295


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3114/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

Karan980 commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732682464


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732731912


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3122/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732732764


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4875/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732804667


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732971703


   @Karan980 : I tried to merge,
   But when I checkout using `git fetch origin pull/4009/head:4009`, this 4009 is pointing some old PR. I guess some problem with this PR. Can you verify once. If same problem just raise a new PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

ajantha-bhat edited a comment on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732971703


   @Karan980 : I tried to merge,
   But when I checkout using `git fetch origin pull/4009/head:4009`, this 4009 is pointing some old PR. I guess some problem with this PR. Can you verify once? If same problem just raise a new PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 closed pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

GitBox
In reply to this post by GitBox

Karan980 closed pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]