Posted by
GitBox on
Mar 16, 2021; 12:48pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GitHub-carbondata-jack86596-opened-a-new-pull-request-4105-CARBONDATA-4148-Reindex-failed-when-SI-hae-tp106768p106827.html
jack86596 commented on a change in pull request #4105:
URL:
https://github.com/apache/carbondata/pull/4105#discussion_r595136721##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala
##########
@@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with BeforeAndAfterAll {
sql("drop table if exists maintable")
}
+ test("reindex command with stale files") {
+ sql("drop table if exists maintable")
+ sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as carbondata")
+ sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'")
+ sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+ sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+ sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+ sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)")
+ sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN (0,1)")
Review comment:
1. "i think, at first, delete segment directly from indexTable should not be allowed." This is unit test, we are simulating the issue(main table semgent success and SI table segment MFD), for real user, they don't need to manually delete SI segment in order to get the issue, they will face this issue directly in their daily running without any manuallly operation.
2. "For second case, where main table contains SUCCESS segments and SI has MFD, during reindex, before loading data again, we can delete old stale files in that segment" Yes, you provide a solution and i agree this will work, but again, the solution you provide is not as good as current one because current one you don't need to delete the stale files, reindex can also work.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[hidden email]