Apache CarbonData Dev Mailing List archive - [GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

Posted by GitBox on Oct 27, 2020; 3:04pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GitHub-carbondata-marchpure-opened-a-new-pull-request-3999-WIP-Segment-listfile-issue-tp102805p102913.html

ajantha-bhat commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r512760255

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -342,7 +342,8 @@ object CarbonDataRDDFactory {

try {
if (!carbonLoadModel.isCarbonTransactionalTable || segmentLock.lockWithRetries()) {
- if (updateModel.isDefined && !updateModel.get.loadAsNewSegment) {
+ if (updateModel.isDefined && (!updateModel.get.loadAsNewSegment

Review comment:
If you just set updateModel.get.loadAsNewSegment= true in update flow is not enough ? please explain why this change is required ?

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/alterTable/TestAlterTableSortColumnsProperty.scala
##########
@@ -739,14 +739,14 @@ class TestAlterTableSortColumnsProperty extends QueryTest with BeforeAndAfterAll

val table = CarbonEnv.getCarbonTable(Option("default"), tableName)(sqlContext.sparkSession)
val tablePath = table.getTablePath
- (0 to 2).foreach { segmentId =>
+ (0 to 3).foreach { segmentId =>

Review comment:
why this change required ? I think you can revert this

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
##########
@@ -340,7 +340,8 @@ private[sql] case class CarbonProjectForUpdateCommand(
case _ => sys.error("")
}

- val updateTableModel = UpdateTableModel(true, currentTime, executorErrors, deletedSegments)
+ val updateTableModel = UpdateTableModel(true, currentTime, executorErrors, deletedSegments,
+ !carbonRelation.carbonTable.isHivePartitionTable)

Review comment:
so, for partition, update will not write as new segment ? How handle dirty data issue for partition table update scenario ?

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestPruneUsingSegmentMinMax.scala
##########
@@ -103,7 +103,7 @@ class TestPruneUsingSegmentMinMax extends QueryTest with BeforeAndAfterAll {
sql("update carbon set(a)=(10) where a=1").collect()
checkAnswer(sql("select count(*) from carbon where a=10"), Seq(Row(3)))
showCache = sql("show metacache on table carbon").collect()
- assert(showCache(0).get(2).toString.equalsIgnoreCase("6/8 index files cached"))
+ assert(showCache(0).get(2).toString.equalsIgnoreCase("1/6 index files cached"))

Review comment:
how 8 index files become 6 (other two were stale ?) and out of 6 why only 1 is cached now ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]