Login  Register

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

Posted by GitBox on Mar 16, 2021; 1:03pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GitHub-carbondata-jack86596-opened-a-new-pull-request-4105-CARBONDATA-4148-Reindex-failed-when-SI-hae-tp106768p106829.html


Indhumathi27 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595146990



##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala
##########
@@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with BeforeAndAfterAll {
     sql("drop table if exists maintable")
   }
 
+  test("reindex command with stale files") {
+    sql("drop table if exists maintable")
+    sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as carbondata")
+    sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'")
+    sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+    sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+    sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+    sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)")
+    sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN (0,1)")

Review comment:
       1. User can face this scenario, what i meant is, allowing delete directly on index Table is also a new issue, when user does it by mistake,  will make main table and SI not in sync. Then, user unneesarily has to perform Reindex. So, you can handle this scenario in this PR.
   2. If you are gonna return only index Files, always merge file name will be null. So, you can return List of index Files alone instead of Map from getIndexFiles method.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]