[GitHub] [carbondata] marchpure opened a new pull request #3999: [WIP] Segment listfile issue

classic Classic list List threaded Threaded
54 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717711741


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4712/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717713614


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2955/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717879622


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4714/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717883291


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2957/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-718352364


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2962/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-718352690


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4719/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-718523188


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2969/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-718541543


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4728/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-718697037


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4730/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-718701205


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2971/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

marchpure commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r514260343



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
##########
@@ -340,7 +340,8 @@ private[sql] case class CarbonProjectForUpdateCommand(
       case _ => sys.error("")
     }
 
-    val updateTableModel = UpdateTableModel(true, currentTime, executorErrors, deletedSegments)
+    val updateTableModel = UpdateTableModel(true, currentTime, executorErrors, deletedSegments,
+      !carbonRelation.carbonTable.isHivePartitionTable)

Review comment:
       I have modified code according to your suggestion. Now, for partition, upodate will wirte as new segment




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r514263181



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
##########
@@ -340,7 +340,8 @@ private[sql] case class CarbonProjectForUpdateCommand(
       case _ => sys.error("")
     }
 
-    val updateTableModel = UpdateTableModel(true, currentTime, executorErrors, deletedSegments)
+    val updateTableModel = UpdateTableModel(true, currentTime, executorErrors, deletedSegments,
+      !carbonRelation.carbonTable.isHivePartitionTable)

Review comment:
       Nice, I will review it again. @QiangCai or others also can once review this PR




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r514266931



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
##########
@@ -340,7 +340,8 @@ private[sql] case class CarbonProjectForUpdateCommand(
       case _ => sys.error("")
     }
 
-    val updateTableModel = UpdateTableModel(true, currentTime, executorErrors, deletedSegments)
+    val updateTableModel = UpdateTableModel(true, currentTime, executorErrors, deletedSegments,
+      !carbonRelation.carbonTable.isHivePartitionTable)

Review comment:
       @marchpure : Also please reply to my other comments or questions if it is handled.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

marchpure commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r514279897



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/alterTable/TestAlterTableSortColumnsProperty.scala
##########
@@ -739,14 +739,14 @@ class TestAlterTableSortColumnsProperty extends QueryTest with BeforeAndAfterAll
 
     val table = CarbonEnv.getCarbonTable(Option("default"), tableName)(sqlContext.sparkSession)
     val tablePath = table.getTablePath
-    (0 to 2).foreach { segmentId =>
+    (0 to 3).foreach { segmentId =>

Review comment:
       now, update will write into new segment 3.
   before. update only write to old segment 2. so test case shall change from (0 to 2) to (0 to 3)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

marchpure commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r514685949



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestPruneUsingSegmentMinMax.scala
##########
@@ -103,7 +103,7 @@ class TestPruneUsingSegmentMinMax extends QueryTest with BeforeAndAfterAll {
     sql("update carbon set(a)=(10) where a=1").collect()
     checkAnswer(sql("select count(*) from carbon where a=10"), Seq(Row(3)))
     showCache = sql("show metacache on table carbon").collect()
-    assert(showCache(0).get(2).toString.equalsIgnoreCase("6/8 index files cached"))
+    assert(showCache(0).get(2).toString.equalsIgnoreCase("1/6 index files cached"))

Review comment:
       1. in this testcase, there is 5 insert and 1 update. if update write into new segments. there will be 6 segments in the table, so in total 6 index files in the table storelocation.
   2. If update write into different segments folder, the data of a = 10 will exists in segment 0/3/4.
   But if update write into only one new segment folder, the data of a = 10 will exists in segment 5.
   
   Now, The data in 6 segments are shown as below.
   
   Segment - 0 :
   +---+---+----+---+-------------------+
   |  a|  b|   c|  d|                  e|
   +---+---+----+---+-------------------+
   |  2| aa|23.6|  8|2017-09-02 00:00:00|
   +---+---+----+---+-------------------+
   
   Segment - 1 :
   +---+---+----+---+-------------------+
   |  a|  b|   c|  d|                  e|
   +---+---+----+---+-------------------+
   |  3| ab|23.4|  5|2017-09-01 00:00:00|
   |  4| aa|23.6|  8|2017-09-02 00:00:00|
   +---+---+----+---+-------------------+
   
   Segment - 2 :
   +---+---+----+---+-------------------+
   |  a|  b|   c|  d|                  e|
   +---+---+----+---+-------------------+
   |  5| ab|23.4|  5|2017-09-01 00:00:00|
   |  6| aa|23.6|  8|2017-09-02 00:00:00|
   +---+---+----+---+-------------------+
   
   Segment - 3 :
   +---+---+----+---+-------------------+
   |  a|  b|   c|  d|                  e|
   +---+---+----+---+-------------------+
   |  2| aa|23.6|  8|2017-09-02 00:00:00|
   +---+---+----+---+-------------------+
   
   Segment - 4 :
   +---+---+----+---+-------------------+
   |  a|  b|   c|  d|                  e|
   +---+---+----+---+-------------------+
   |  2| aa|23.6|  8|2017-09-02 00:00:00|
   +---+---+----+---+-------------------+
   
   Segment - 5 :
   +---+---+----+---+-------------------+
   |  a|  b|   c|  d|                  e|
   +---+---+----+---+-------------------+
   | 10| ab|23.4|  5|2017-09-01 00:00:00|
   | 10| ab|23.4|  5|2017-09-01 00:00:00|
   | 10| ab|23.4|  5|2017-09-01 00:00:00|
   +---+---+----+---+-------------------+




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

marchpure commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r514719866



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -342,7 +342,8 @@ object CarbonDataRDDFactory {
 
     try {
       if (!carbonLoadModel.isCarbonTransactionalTable || segmentLock.lockWithRetries()) {
-        if (updateModel.isDefined && !updateModel.get.loadAsNewSegment) {
+        if (updateModel.isDefined && (!updateModel.get.loadAsNewSegment

Review comment:
       I have modified code according to your suggestion.
   if (updateModel.isDefined && dataframe.isEmpty) = true
   it means the row to updated is Empty, we avoid to trigger loading process for empty dataset.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

marchpure commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r514720835



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala
##########
@@ -121,7 +121,7 @@ case class UpdateTableModel(
     updatedTimeStamp: Long,
     var executorErrors: ExecutionErrors,
     deletedSegments: Seq[Segment],
-    loadAsNewSegment: Boolean = false)
+    loadAsNewSegment: Boolean = true)

Review comment:
       I have removed all code about loadAsNewSegment




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-719160498


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4734/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-719163064


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2975/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-719410000


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4735/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


123