[GitHub] [carbondata] marchpure opened a new pull request #3999: [WIP] Segment listfile issue

classic Classic list List threaded Threaded
54 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure opened a new pull request #3999: [WIP] Segment listfile issue

GitBox

marchpure opened a new pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999


    ### Why is this PR needed?
   
   
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [WIP] Segment listfile issue

GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-716306626


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2929/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [WIP] Segment listfile issue

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-716306962


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4686/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [WIP] Segment listfile issue

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-716394910


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4688/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [WIP] Segment listfile issue

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-716396690


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2931/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [WIP] Segment listfile issue

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-716449317


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4689/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [WIP] Segment listfile issue

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-716451904


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2932/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-716961859


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4694/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-716962034


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2937/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717257966


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4706/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717258126


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2949/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r512760255



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -342,7 +342,8 @@ object CarbonDataRDDFactory {
 
     try {
       if (!carbonLoadModel.isCarbonTransactionalTable || segmentLock.lockWithRetries()) {
-        if (updateModel.isDefined && !updateModel.get.loadAsNewSegment) {
+        if (updateModel.isDefined && (!updateModel.get.loadAsNewSegment

Review comment:
       If you just set updateModel.get.loadAsNewSegment= true in update flow is not enough ? please explain why this change is required ?

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/alterTable/TestAlterTableSortColumnsProperty.scala
##########
@@ -739,14 +739,14 @@ class TestAlterTableSortColumnsProperty extends QueryTest with BeforeAndAfterAll
 
     val table = CarbonEnv.getCarbonTable(Option("default"), tableName)(sqlContext.sparkSession)
     val tablePath = table.getTablePath
-    (0 to 2).foreach { segmentId =>
+    (0 to 3).foreach { segmentId =>

Review comment:
       why this change required ? I think you can revert this

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
##########
@@ -340,7 +340,8 @@ private[sql] case class CarbonProjectForUpdateCommand(
       case _ => sys.error("")
     }
 
-    val updateTableModel = UpdateTableModel(true, currentTime, executorErrors, deletedSegments)
+    val updateTableModel = UpdateTableModel(true, currentTime, executorErrors, deletedSegments,
+      !carbonRelation.carbonTable.isHivePartitionTable)

Review comment:
       so, for partition, update will not write as new segment ? How handle dirty data issue for partition table update scenario ?

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestPruneUsingSegmentMinMax.scala
##########
@@ -103,7 +103,7 @@ class TestPruneUsingSegmentMinMax extends QueryTest with BeforeAndAfterAll {
     sql("update carbon set(a)=(10) where a=1").collect()
     checkAnswer(sql("select count(*) from carbon where a=10"), Seq(Row(3)))
     showCache = sql("show metacache on table carbon").collect()
-    assert(showCache(0).get(2).toString.equalsIgnoreCase("6/8 index files cached"))
+    assert(showCache(0).get(2).toString.equalsIgnoreCase("1/6 index files cached"))

Review comment:
       how 8 index files become 6 (other two were stale ?) and out of 6 why only 1 is cached now ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717325746


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4708/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717329144


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2951/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r512886166



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala
##########
@@ -121,7 +121,7 @@ case class UpdateTableModel(
     updatedTimeStamp: Long,
     var executorErrors: ExecutionErrors,
     deletedSegments: Seq[Segment],
-    loadAsNewSegment: Boolean = false)
+    loadAsNewSegment: Boolean = true)

Review comment:
       There is no use of horizontal compaction features after this change?
   @QiangCai , @marchpure : we need to mark it as deprecated ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717492977


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4710/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717496054


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2953/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

QiangCai commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r513152963



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala
##########
@@ -121,7 +121,7 @@ case class UpdateTableModel(
     updatedTimeStamp: Long,
     var executorErrors: ExecutionErrors,
     deletedSegments: Seq[Segment],
-    loadAsNewSegment: Boolean = false)
+    loadAsNewSegment: Boolean = true)

Review comment:
       remove it, always update into new segment




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717678439


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4711/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717678491


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2954/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


123