[GitHub] [carbondata] marchpure commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

Posted by GitBox on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GitHub-carbondata-marchpure-opened-a-new-pull-request-3999-WIP-Segment-listfile-issue-tp102805p103024.html


marchpure commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r514685949



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestPruneUsingSegmentMinMax.scala
##########
@@ -103,7 +103,7 @@ class TestPruneUsingSegmentMinMax extends QueryTest with BeforeAndAfterAll {
     sql("update carbon set(a)=(10) where a=1").collect()
     checkAnswer(sql("select count(*) from carbon where a=10"), Seq(Row(3)))
     showCache = sql("show metacache on table carbon").collect()
-    assert(showCache(0).get(2).toString.equalsIgnoreCase("6/8 index files cached"))
+    assert(showCache(0).get(2).toString.equalsIgnoreCase("1/6 index files cached"))

Review comment:
       1. in this testcase, there is 5 insert and 1 update. if update write into new segments. there will be 6 segments in the table, so in total 6 index files in the table storelocation.
   2. If update write into different segments folder, the data of a = 10 will exists in segment 0/3/4.
   But if update write into only one new segment folder, the data of a = 10 will exists in segment 5.
   
   Now, The data in 6 segments are shown as below.
   
   Segment - 0 :
   +---+---+----+---+-------------------+
   |  a|  b|   c|  d|                  e|
   +---+---+----+---+-------------------+
   |  2| aa|23.6|  8|2017-09-02 00:00:00|
   +---+---+----+---+-------------------+
   
   Segment - 1 :
   +---+---+----+---+-------------------+
   |  a|  b|   c|  d|                  e|
   +---+---+----+---+-------------------+
   |  3| ab|23.4|  5|2017-09-01 00:00:00|
   |  4| aa|23.6|  8|2017-09-02 00:00:00|
   +---+---+----+---+-------------------+
   
   Segment - 2 :
   +---+---+----+---+-------------------+
   |  a|  b|   c|  d|                  e|
   +---+---+----+---+-------------------+
   |  5| ab|23.4|  5|2017-09-01 00:00:00|
   |  6| aa|23.6|  8|2017-09-02 00:00:00|
   +---+---+----+---+-------------------+
   
   Segment - 3 :
   +---+---+----+---+-------------------+
   |  a|  b|   c|  d|                  e|
   +---+---+----+---+-------------------+
   |  2| aa|23.6|  8|2017-09-02 00:00:00|
   +---+---+----+---+-------------------+
   
   Segment - 4 :
   +---+---+----+---+-------------------+
   |  a|  b|   c|  d|                  e|
   +---+---+----+---+-------------------+
   |  2| aa|23.6|  8|2017-09-02 00:00:00|
   +---+---+----+---+-------------------+
   
   Segment - 5 :
   +---+---+----+---+-------------------+
   |  a|  b|   c|  d|                  e|
   +---+---+----+---+-------------------+
   | 10| ab|23.4|  5|2017-09-01 00:00:00|
   | 10| ab|23.4|  5|2017-09-01 00:00:00|
   | 10| ab|23.4|  5|2017-09-01 00:00:00|
   +---+---+----+---+-------------------+




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]