[GitHub] [carbondata] ShreelekhyaG opened a new pull request #3988: [WIP] Clean index files when clean files command executed

classic Classic list List threaded Threaded
171 messages Options
123456 ... 9
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox

CarbonDataQA1 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-724165516


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3029/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-724626204


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4788/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-724631565


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3030/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#discussion_r520568875



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala
##########
@@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.cleanfiles
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter}
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.index.Segment
+import org.apache.carbondata.core.metadata.{CarbonMetadata, SegmentFileStore}
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.carbondata.core.util.path.CarbonTablePath
+
+class TestCleanFileCommand extends QueryTest with BeforeAndAfterAll {
+
+  var count = 0
+
+  override protected def beforeAll(): Unit = {
+    sql("DROP TABLE IF EXISTS cleantest")
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.MAX_QUERY_EXECUTION_TIME, "0")
+  }
+
+  override protected def afterAll(): Unit = {
+    sql("DROP TABLE IF EXISTS cleantest")
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.MAX_QUERY_EXECUTION_TIME,
+        CarbonCommonConstants.DEFAULT_MAX_QUERY_EXECUTION_TIME.toString)
+  }
+
+  test("test clean files command for index files") {
+    sql("drop table if exists cleantest")
+    sql("create table cleantest(id int, issue date) STORED AS carbondata")
+    sql("insert into table cleantest select '1','2000-02-01'")
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "0") == 1)
+    sql("clean files for table cleantest")
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "0") == 0)
+    sql("drop table if exists cleantest")
+  }
+
+  test("test clean files command for index files with SI") {

Review comment:
       ok done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#discussion_r520568975



##########
File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##########
@@ -274,20 +274,22 @@ public static boolean updateTableMetadataStatus(Set<Segment> updatedSegmentsList
 
         LoadMetadataDetails[] listOfLoadFolderDetailsArray =
                 SegmentStatusManager.readLoadMetadata(metaDataFilepath);
-
+        Boolean isUpdateRequired = false;

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#discussion_r520572527



##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/SegmentIndexFileStore.java
##########
@@ -79,6 +79,11 @@
    */
   private Map<String, List<String>> carbonMergeFileToIndexFilesMap;
 
+  /**
+   * Stores the list of invalid index files of the SI segments in case of small files.
+   */
+  private static Set<String> oldSIIndexAndMergeIndexFiles = new HashSet<>();

Review comment:
       removed the usage of oldSIIndexAndMergeIndexFiles , as #3999 deals with merge and write segment file based on UUID (timestamp), the list is no longer needed for these steps.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#discussion_r522092928



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala
##########
@@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.cleanfiles
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter}
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.index.Segment
+import org.apache.carbondata.core.metadata.{CarbonMetadata, SegmentFileStore}
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.carbondata.core.util.path.CarbonTablePath
+
+class TestCleanFileCommand extends QueryTest with BeforeAndAfterAll {
+
+  var count = 0
+
+  override protected def beforeAll(): Unit = {
+    sql("DROP TABLE IF EXISTS cleantest")
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.MAX_QUERY_EXECUTION_TIME, "0")
+  }
+
+  override protected def afterAll(): Unit = {
+    sql("DROP TABLE IF EXISTS cleantest")
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.MAX_QUERY_EXECUTION_TIME,
+        CarbonCommonConstants.DEFAULT_MAX_QUERY_EXECUTION_TIME.toString)
+  }
+
+  test("test clean files command for index files") {
+    sql("drop table if exists cleantest")
+    sql("create table cleantest(id int, issue date) STORED AS carbondata")
+    sql("insert into table cleantest select '1','2000-02-01'")
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "0") == 1)
+    sql("clean files for table cleantest")
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "0") == 0)
+    sql("drop table if exists cleantest")
+  }
+
+  test("test clean files command for index files with SI") {
+    sql("drop table if exists cleantest")
+    sql("create table cleantest(id int, issue date, name string) STORED AS carbondata")
+    sql("insert into table cleantest select '1','2000-02-01', 'abc' ")
+    sql("create index indextable1 on table cleantest (name) AS 'carbondata'")
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "0") == 1)
+    assert(getIndexFileCountFromSegmentPath("default_indextable1", "0") == 1)
+    sql("clean files for table cleantest")
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "0") == 0)
+    assert(getIndexFileCountFromSegmentPath("default_indextable1", "0") == 0)
+    sql("drop table if exists cleantest")
+  }
+
+  test("test clean files command on partition table") {
+    sql("drop table if exists cleantest")
+    sql("create table cleantest(id int, issue date) STORED AS carbondata " +
+        "partitioned by (name string)")
+    sql("insert into table cleantest select '1','2000-02-01', 'abc' ")
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "0") == 1)
+    sql("clean files for table cleantest")
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "0") == 0)
+    sql("drop table if exists cleantest")
+  }
+
+  test("test clean files command for index files after update") {
+    sql("drop table if exists cleantest")
+    sql("create table cleantest(id int, name string) STORED AS carbondata")
+    sql("insert into table cleantest select '1', 'abc' ")
+    sql("insert into table cleantest select '2', 'abc' ")
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "0") == 1)
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "1") == 1)
+    sql("update cleantest set (name)=('xyz') where id=2")
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "1") == 2)
+    sql("clean files for table cleantest")
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "1") == 1)
+    checkAnswer(sql("select *from cleantest where id=2"), Seq(Row(2, "xyz")))
+  }
+
+  test("test clean files without mergeindex") {
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT, "false")
+    sql("drop table if exists cleantest")
+    sql("create table cleantest(id int, name string) STORED AS carbondata")
+    sql("insert into table cleantest select '1', 'abc' ")
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "0") == 1)
+    assert(getIndexFileCountFromSegmentFile("default_cleantest", "0") == 1)
+    sql("clean files for table cleantest")
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "0") == 1)
+    assert(getIndexFileCountFromSegmentFile("default_cleantest", "0") == 1)
+    checkAnswer(sql("select *from cleantest where id=1"), Seq(Row(1, "abc")))
+  }
+
+  test("test clean files after index merge and check segment count") {
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT, "false")
+    sql("drop table if exists cleantest")
+    sql(s"create table cleantest(id int, issue date) STORED AS carbondata")
+    sql("insert into table cleantest select '1','2000-02-01'")
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "0") == 1)
+    assert(getSegmentFileCount("default_cleantest") == 1)
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT, "true")
+    sql("ALTER TABLE cleantest COMPACT 'SEGMENT_INDEX'").collect()
+    assert(getSegmentFileCount("default_cleantest") == 2)
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "0") == 1)
+    assert(getIndexFileCountFromSegmentPath("default_cleantest",
+      "0", CarbonTablePath.MERGE_INDEX_FILE_EXT) == 1)
+    sql("clean files for table cleantest")
+    assert(getSegmentFileCount("default_cleantest") == 1)
+    assert(getIndexFileCountFromSegmentPath("default_cleantest", "0") == 0)
+  }
+
+  private def getSegmentFileCount(tableName: String): Int = {
+    val carbonTable = CarbonMetadata.getInstance().getCarbonTable("default_cleantest")
+    val segmentsPath = CarbonTablePath.getSegmentFilesLocation(carbonTable.getTablePath)
+    FileFactory.getCarbonFile(segmentsPath).listFiles(true).size()
+  }
+
+  private def getIndexFileCountFromSegmentFile(tableName: String, segmentNo: String): Int = {
+    val carbonTable = CarbonMetadata.getInstance().getCarbonTable(tableName)
+    val segment = Segment.getSegment(segmentNo, carbonTable.getTablePath)
+    new SegmentFileStore(carbonTable.getTablePath, segment.getSegmentFileName)
+      .getIndexFiles.size()
+  }
+
+  private def getIndexFileCountFromSegmentPath(tableName: String,

Review comment:
       a similar method is used in other testcase but there I had to list all files and filter only valid ones through `SegmentFileStore.getValidCarbonIndexFiles`. In TestCleanFileCommand, I need the actual count.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-733804391


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4906/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-733813552


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3151/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-736807198


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3251/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-736810208


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5006/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-737152317


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5021/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-737155601


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3265/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#discussion_r534107457



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -162,15 +162,25 @@ public static void writeSegmentFile(String tablePath, String segmentId, String t
    * corresponding partitions.
    */
   public static void writeSegmentFile(String tablePath, final String taskNo, String location,
-      String timeStamp, List<String> partitionNames, boolean isMergeIndexFlow) throws IOException {
-    String tempFolderLoc = timeStamp + ".tmp";
-    String writePath = CarbonTablePath.getSegmentFilesLocation(tablePath) + "/" + tempFolderLoc;
+      String timeStamp, List<String> partitionNames, boolean isMergeIndexFlow,
+      boolean readFileFooterFromCarbonDataFile) throws IOException {

Review comment:
       rename to some meaningful , its confusing now `readFileFooterFromCarbonDataFile`

##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -198,20 +208,17 @@ public boolean accept(CarbonFile file) {
         folderDetails.setRelative(isRelative);
         folderDetails.setPartitions(partitionNames);
         folderDetails.setStatus(SegmentStatus.SUCCESS.getMessage());
-        for (CarbonFile file : carbonFiles) {
-          if (file.getName().endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)) {
-            folderDetails.setMergeFileName(file.getName());
-          } else {
-            folderDetails.getFiles().add(file.getName());
-          }
-        }
+        setIndexFileNamesToFolderDetails(folderDetails, carbonFiles);
         segmentFile.addPath(location, folderDetails);
         String path = null;
         if (isMergeIndexFlow) {
           // in case of merge index flow, tasks are launched per partition and all the tasks
           // will be written to the same tmp folder, in that case taskNo is not unique.
           // To generate a unique fileName UUID is used
           path = writePath + "/" + CarbonUtil.generateUUID() + CarbonTablePath.SEGMENT_EXT;
+          if (readFileFooterFromCarbonDataFile) {
+            path = writePath + "/" + timeStamp + CarbonTablePath.SEGMENT_EXT;

Review comment:
       can you please check why this change is added, if valid change, please add a proper comment explaining the scenario why this required

##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -439,6 +430,73 @@ public boolean accept(CarbonFile file) {
     return null;
   }
 
+  /**

Review comment:
       please check if the flow is in below order in case of SI
   
   in progress - merge index MT - update or generate segment file for MT - call load SI - (all SI functions) - success MT

##########
File path: core/src/main/java/org/apache/carbondata/core/writer/CarbonIndexFileMergeWriter.java
##########
@@ -140,33 +141,36 @@ private String mergeCarbonIndexFilesOfSegment(String segmentId,
         groupIndexesBySegment(fileStore.getCarbonIndexMapWithFullPath());
     SegmentFileStore.FolderDetails folderDetails = null;
     for (Map.Entry<String, Map<String, byte[]>> entry : indexLocationMap.entrySet()) {
-      String mergeIndexFile =
-          writeMergeIndexFile(null, partitionPath, entry.getValue(), segmentId, uuid);
-      folderDetails = new SegmentFileStore.FolderDetails();
-      folderDetails.setMergeFileName(mergeIndexFile);
-      folderDetails.setStatus("Success");
-      if (partitionPath.startsWith(tablePath)) {
-        partitionPath = partitionPath.substring(tablePath.length() + 1);
-        List<String> partitions = new ArrayList<>(Arrays.asList(partitionPath.split("/")));
+      Map<String, List<String>> mergeToIndexFileMap = fileStore.getCarbonMergeFileToIndexFilesMap();

Review comment:
       please cross verify, i think this changes not required if you handle in compact merge index for new tables whether to go for merging index files or no for new tables based on segment file and list files in the absence of segment file.

##########
File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##########
@@ -672,7 +678,7 @@ public static boolean isMaxQueryTimeoutExceeded(long fileTimestamp) {
 
     long minutesElapsed = (difference / (1000 * 60));
 
-    return minutesElapsed > maxTime;
+    return minutesElapsed >= maxTime;

Review comment:
       please revert this

##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java
##########
@@ -2558,8 +2558,11 @@ public static long getCarbonIndexSize(SegmentFileStore fileStore,
   // Get the total size of carbon data and the total size of carbon index
   public static HashMap<String, Long> getDataSizeAndIndexSize(String tablePath,
       Segment segment) throws IOException {
+    SegmentFileStore fileStore = null;
     if (segment.getSegmentFileName() != null) {
-      SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName());
+      fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName());
+    }
+    if (segment.getSegmentFileName() != null && fileStore.getSegmentFile() != null) {

Review comment:
       please check in which case, the scenario exists like you have segment file name but file is not present. If that case doesnt exists, then i think you can move line 2566 after 2563

##########
File path: integration/spark/src/main/scala/org/apache/spark/rdd/CarbonMergeFilesRDD.scala
##########
@@ -167,6 +173,21 @@ object CarbonMergeFilesRDD {
             executorService.submit(new Runnable {
               override def run(): Unit = {
                 ThreadLocalSessionInfo.setCarbonSessionInfo(carbonSessionInfo)
+                // If Alter merge index for old tables is triggered, do not delete index files
+                // immediately to avoid index file not found during concurrent queries
+                if (readFileFooterFromCarbonDataFile ||
+                    !isPropertySet(CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT,
+                      CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT_DEFAULT)) {

Review comment:
       same as above

##########
File path: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
##########
@@ -1221,6 +1223,20 @@ public static String mergeIndexFilesInPartitionedSegment(CarbonTable table, Stri
             tempFolderPath, currPartitionSpec);
   }
 
+  public static String mergeIndexFilesInTempSegment(CarbonTable table, String segmentId,
+      String segmentPath, String uuid) {
+    try {
+      return new CarbonIndexFileMergeWriter(table)
+          .writeMergeIndexFileBasedOnSegmentFolder(null, false, segmentPath, segmentId, uuid,
+              false);
+    } catch (IOException e) {
+      String message =
+          "Failed to merge index files in path: " + segmentPath + ". " + e.getMessage();

Review comment:
       use colon

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/CarbonIndexFileMergeTestCase.scala
##########
@@ -530,21 +530,22 @@ class CarbonIndexFileMergeTestCase
       FileFactory.getCarbonFile(table.getAbsoluteTableIdentifier.getTablePath)
         .listFiles(true, new CarbonFileFilter {
           override def accept(file: CarbonFile): Boolean = {
-            file.getName.endsWith(extension)

Review comment:
       please add a test case of when merge index is failed, then complete load should fail, you can use mocking

##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -162,15 +162,25 @@ public static void writeSegmentFile(String tablePath, String segmentId, String t
    * corresponding partitions.
    */
   public static void writeSegmentFile(String tablePath, final String taskNo, String location,
-      String timeStamp, List<String> partitionNames, boolean isMergeIndexFlow) throws IOException {
-    String tempFolderLoc = timeStamp + ".tmp";
-    String writePath = CarbonTablePath.getSegmentFilesLocation(tablePath) + "/" + tempFolderLoc;
+      String timeStamp, List<String> partitionNames, boolean isMergeIndexFlow,
+      boolean readFileFooterFromCarbonDataFile) throws IOException {
+    String tempFolderLoc;
+    String writePath;
+    if (!readFileFooterFromCarbonDataFile) {
+      tempFolderLoc = timeStamp + ".tmp";
+      writePath = CarbonTablePath.getSegmentFilesLocation(tablePath) + "/" + tempFolderLoc;
+    } else {
+      // If Alter merge index for old tables is triggered,
+      // directly write mergeindex file into segment file location
+      tempFolderLoc = location;
+      writePath = CarbonTablePath.getSegmentFilesLocation(tablePath);
+    }
     CarbonFile carbonFile = FileFactory.getCarbonFile(writePath);
     if (!carbonFile.exists()) {
       carbonFile.mkdirs();
     }
     CarbonFile tempFolder;
-    if (isMergeIndexFlow) {
+    if (isMergeIndexFlow || readFileFooterFromCarbonDataFile) {

Review comment:
       looks like its always true in if , in both old table case and new load also, please check

##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -664,7 +723,8 @@ public static SegmentFile getSegmentFileForPhysicalDataPartitions(String tablePa
       CarbonFile[] listFiles = carbonFile.listFiles(new CarbonFileFilter() {
         @Override
         public boolean accept(CarbonFile file) {
-          return CarbonTablePath.isCarbonIndexFile(file.getAbsolutePath());
+          return file.getName().contains(uuid) && CarbonTablePath

Review comment:
       please add a comment with scenario explained with respect to UUID.

##########
File path: core/src/main/java/org/apache/carbondata/core/writer/CarbonIndexFileMergeWriter.java
##########
@@ -277,21 +305,18 @@ public String writeMergeIndexFileBasedOnSegmentFile(String segmentId,
         LOGGER.error("unable to write segment file during merge index writing: " + ex.getMessage());
         throw ex;
       }
-      boolean status = SegmentFileStore.updateTableStatusFile(table, segmentId, newSegmentFileName,
-          table.getCarbonTableIdentifier().getTableId(), segmentFileStore);
-      if (!status) {
-        // revert to original segment file as the table status update has failed.
-        SegmentStatusManager.writeStringIntoFile(path, content);
-        // delete merge index file.
-        for (String file : mergeIndexFiles) {
-          FileFactory.getCarbonFile(file).delete();
-        }
-        // no need to delete index files, so return from here.
-        return uuid;
-      }
     }
-    for (CarbonFile file : indexFiles) {
-      file.delete();
+    boolean status = SegmentFileStore.updateTableStatusFile(table, segmentId, newSegmentFileName,

Review comment:
       this update shouldnt happen here, because now the plan is we should minimize the file operations, so after merge index generation, we should update the segment file name and success status to table status only once.

##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -439,6 +430,73 @@ public boolean accept(CarbonFile file) {
     return null;
   }
 
+  /**
+   * Get old and invalid files which have already been merged to a mergeindex file.In segment folder
+   * we may have both .index files and .mergeindex files, as we are not deleting index files
+   * immediately for old tables, this method reads mergeindex file and adds mapped index files to a
+   * list and returns.If more than one mergeindex file is present, considers the latest one as valid
+   * Ex: We have 3 files in segment. Segment0/ 1.index , 1.mergeindex file, 1.carbondata.
+   * 1.index is merged to 1.mergeindex. Here it returns merged index file - 1.index.
+   */
+  public static Set<String> getInvalidAndMergedIndexFiles(List<String> indexFiles)
+      throws IOException {
+    SegmentIndexFileStore indexFileStore = new SegmentIndexFileStore();

Review comment:
       please check this method if this taken care that for old tables, that too if the alter merge index happened then only this should execute, else no, please confirm once

##########
File path: core/src/main/java/org/apache/carbondata/core/writer/CarbonIndexFileMergeWriter.java
##########
@@ -241,13 +266,17 @@ public String writeMergeIndexFileBasedOnSegmentFile(String segmentId,
           break;
         }
       }
+      if (!table.isIndexTable()) {

Review comment:
       please check if this is required or not

##########
File path: core/src/main/java/org/apache/carbondata/core/writer/CarbonIndexFileMergeWriter.java
##########
@@ -184,29 +188,50 @@ private String mergeCarbonIndexFilesOfSegment(String segmentId,
     return indexLocationMap;
   }
 
-  private String writeMergeIndexFileBasedOnSegmentFolder(List<String> indexFileNamesTobeAdded,
-      boolean readFileFooterFromCarbonDataFile, String segmentPath, CarbonFile[] indexFiles,
-      String segmentId, String uuid) throws IOException {
+  public String writeMergeIndexFileBasedOnSegmentFolder(List<String> indexFileNamesTobeAdded,
+      boolean readFileFooterFromCarbonDataFile, String segmentPath,
+      String segmentId, String uuid, boolean readBasedOnUUID) throws IOException {
+    CarbonFile[] indexFiles = null;
     SegmentIndexFileStore fileStore = new SegmentIndexFileStore();
     if (readFileFooterFromCarbonDataFile) {
       // this case will be used in case of upgrade where old store will not have the blocklet
       // info in the index file and therefore blocklet info need to be read from the file footer
       // in the carbondata file
       fileStore.readAllIndexAndFillBlockletInfo(segmentPath, uuid);
     } else {
-      fileStore.readAllIIndexOfSegment(segmentPath, uuid);
+      if (readBasedOnUUID) {
+        indexFiles = SegmentIndexFileStore
+            .getCarbonIndexFiles(segmentPath, FileFactory.getConfiguration(), uuid);
+        fileStore.readAllIIndexOfSegment(segmentPath, uuid);
+      } else {
+        // The uuid can be different, when we add load from external path.
+        indexFiles =
+            SegmentIndexFileStore.getCarbonIndexFiles(segmentPath, FileFactory.getConfiguration());
+        fileStore.readAllIIndexOfSegment(segmentPath);
+      }
     }
     Map<String, byte[]> indexMap = fileStore.getCarbonIndexMap();
-    writeMergeIndexFile(indexFileNamesTobeAdded, segmentPath, indexMap, segmentId, uuid);
-    for (CarbonFile indexFile : indexFiles) {
-      indexFile.delete();
+    Map<String, List<String>> mergeToIndexFileMap = fileStore.getCarbonMergeFileToIndexFilesMap();

Review comment:
       same as above

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
##########
@@ -267,15 +269,20 @@ class CarbonTableCompactor(carbonLoadModel: CarbonLoadModel,
         // Merge all partition files into a single file.
         segmentFileName =
           mergedLoadNumber + "_" + carbonLoadModel.getFactTimeStamp
-        val segmentFile = SegmentFileStore
-          .mergeSegmentFiles(readPath,
-            segmentFileName,
-            CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath))
-        if (segmentFile != null) {
-          SegmentFileStore
-            .moveFromTempFolder(segmentFile,
-              carbonLoadModel.getFactTimeStamp + ".tmp",
-              carbonLoadModel.getTablePath)
+        if (!isPropertySet(CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT,
+          CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT_DEFAULT)) {

Review comment:
       now by default the index merge is true, so please make deprecated in Carbon constant file and check this if can be removed

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
##########
@@ -226,85 +233,59 @@ object SecondaryIndexUtil {
               validSegmentsToUse.toList.asJava,
               indexCarbonTable)
           }
-          mergedSegments.asScala.map { seg =>
-            val file = SegmentFileStore.writeSegmentFile(
-              indexCarbonTable,
-              seg.getLoadName,
-              carbonLoadModel.getFactTimeStamp.toString,
-              null,
-              null)
-            val segment = new Segment(seg.getLoadName, file)
-            SegmentFileStore.updateTableStatusFile(indexCarbonTable,
-              seg.getLoadName,
-              file,
-              indexCarbonTable.getCarbonTableIdentifier.getTableId,
-              new SegmentFileStore(tablePath, segment.getSegmentFileName))
-            segment
-          }
-
-          val statusLock =
-            new SegmentStatusManager(indexCarbonTable.getAbsoluteTableIdentifier).getTableStatusLock
-          try {
-            val retryCount = CarbonLockUtil.getLockProperty(CarbonCommonConstants
-              .NUMBER_OF_TRIES_FOR_CONCURRENT_LOCK,
-              CarbonCommonConstants.NUMBER_OF_TRIES_FOR_CONCURRENT_LOCK_DEFAULT)
-            val maxTimeout = CarbonLockUtil.getLockProperty(CarbonCommonConstants
-              .MAX_TIMEOUT_FOR_CONCURRENT_LOCK,
-              CarbonCommonConstants.MAX_TIMEOUT_FOR_CONCURRENT_LOCK_DEFAULT)
-            if (statusLock.lockWithRetries(retryCount, maxTimeout)) {
-              val endTime = System.currentTimeMillis()
-              val loadMetadataDetails = SegmentStatusManager
-                .readLoadMetadata(indexCarbonTable.getMetadataPath)
-              loadMetadataDetails.foreach(loadMetadataDetail => {
-                if (rebuiltSegments.contains(loadMetadataDetail.getLoadName)) {
-                  loadMetadataDetail.setLoadStartTime(carbonLoadModel.getFactTimeStamp)
-                  loadMetadataDetail.setLoadEndTime(endTime)
-                  CarbonLoaderUtil
-                    .addDataIndexSizeIntoMetaEntry(loadMetadataDetail,
-                      loadMetadataDetail.getLoadName,
-                      indexCarbonTable)
-                }
-              })
-              SegmentStatusManager
-                .writeLoadDetailsIntoFile(CarbonTablePath.getTableStatusFilePath(tablePath),
-                  loadMetadataDetails)
-            } else {
-              throw new RuntimeException(
-                "Not able to acquire the lock for table status updation for table " + databaseName +
-                "." + indexCarbonTable.getTableName)
-            }
-          } finally {
-            if (statusLock != null) {
-              statusLock.unlock()
-            }
-          }
-          // clear the indexSchema cache for the merged segments, as the index files and
-          // data files are rewritten after compaction
+          val segmentToLoadStartTimeMap: scala.collection.mutable.Map[String, java.lang.Long] =
+            scala.collection.mutable.Map()
           if (mergedSegments.size > 0) {
-
-            // merge index files for merged segments
-            CarbonMergeFilesRDD.mergeIndexFiles(sc.sparkSession,
-              rebuiltSegments.toSeq,
-              segmentIdToLoadStartTimeMap,
-              indexCarbonTable.getTablePath,
-              indexCarbonTable, mergeIndexProperty = false
-            )
-
-            if (CarbonProperties.getInstance()
-              .isDistributedPruningEnabled(indexCarbonTable.getDatabaseName,
-                indexCarbonTable.getTableName)) {
-              try {
-                IndexServer.getClient
-                  .invalidateSegmentCache(indexCarbonTable,
-                    rebuiltSegments.toArray,
-                    SparkSQLUtil.getTaskGroupId(sc.sparkSession))
-              } catch {
-                case _: Exception =>
+            // merge index files and write segment file for merged segments
+            mergedSegments.asScala.map { seg =>
+              if (isPropertySet(CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT,
+                CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT_DEFAULT)) {
+                new CarbonIndexFileMergeWriter(indexCarbonTable).mergeCarbonIndexFilesOfSegment(seg
+                  .getLoadName,
+                  tablePath,
+                  false,
+                  carbonLoadModel.getFactTimeStamp.toString)
+              }
+              val file = SegmentFileStore.writeSegmentFile(
+                indexCarbonTable,
+                seg.getLoadName,
+                carbonLoadModel.getFactTimeStamp.toString,
+                null,
+                null)
+              segmentToLoadStartTimeMap.put(seg.getLoadName,
+                carbonLoadModel.getFactTimeStamp)
+              // clear the indexSchema cache for the merged segments, as the index files and
+              // data files are rewritten after compaction
+              if (CarbonProperties.getInstance()
+                .isDistributedPruningEnabled(indexCarbonTable.getDatabaseName,
+                  indexCarbonTable.getTableName)) {
+                try {
+                  IndexServer.getClient
+                    .invalidateSegmentCache(indexCarbonTable,
+                      rebuiltSegments.toArray,
+                      SparkSQLUtil.getTaskGroupId(sc.sparkSession))
+                } catch {
+                  case _: Exception =>
+                }
               }
+              val segment = new Segment(seg.getLoadName, file)
+              segment
+            }
+            if (compactionType == null) {

Review comment:
       add a comment




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#discussion_r534175368



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
##########
@@ -226,85 +233,59 @@ object SecondaryIndexUtil {
               validSegmentsToUse.toList.asJava,
               indexCarbonTable)
           }
-          mergedSegments.asScala.map { seg =>
-            val file = SegmentFileStore.writeSegmentFile(
-              indexCarbonTable,
-              seg.getLoadName,
-              carbonLoadModel.getFactTimeStamp.toString,
-              null,
-              null)
-            val segment = new Segment(seg.getLoadName, file)
-            SegmentFileStore.updateTableStatusFile(indexCarbonTable,
-              seg.getLoadName,
-              file,
-              indexCarbonTable.getCarbonTableIdentifier.getTableId,
-              new SegmentFileStore(tablePath, segment.getSegmentFileName))
-            segment
-          }
-
-          val statusLock =
-            new SegmentStatusManager(indexCarbonTable.getAbsoluteTableIdentifier).getTableStatusLock
-          try {
-            val retryCount = CarbonLockUtil.getLockProperty(CarbonCommonConstants
-              .NUMBER_OF_TRIES_FOR_CONCURRENT_LOCK,
-              CarbonCommonConstants.NUMBER_OF_TRIES_FOR_CONCURRENT_LOCK_DEFAULT)
-            val maxTimeout = CarbonLockUtil.getLockProperty(CarbonCommonConstants
-              .MAX_TIMEOUT_FOR_CONCURRENT_LOCK,
-              CarbonCommonConstants.MAX_TIMEOUT_FOR_CONCURRENT_LOCK_DEFAULT)
-            if (statusLock.lockWithRetries(retryCount, maxTimeout)) {
-              val endTime = System.currentTimeMillis()
-              val loadMetadataDetails = SegmentStatusManager
-                .readLoadMetadata(indexCarbonTable.getMetadataPath)
-              loadMetadataDetails.foreach(loadMetadataDetail => {
-                if (rebuiltSegments.contains(loadMetadataDetail.getLoadName)) {
-                  loadMetadataDetail.setLoadStartTime(carbonLoadModel.getFactTimeStamp)
-                  loadMetadataDetail.setLoadEndTime(endTime)
-                  CarbonLoaderUtil
-                    .addDataIndexSizeIntoMetaEntry(loadMetadataDetail,
-                      loadMetadataDetail.getLoadName,
-                      indexCarbonTable)
-                }
-              })
-              SegmentStatusManager
-                .writeLoadDetailsIntoFile(CarbonTablePath.getTableStatusFilePath(tablePath),
-                  loadMetadataDetails)
-            } else {
-              throw new RuntimeException(
-                "Not able to acquire the lock for table status updation for table " + databaseName +
-                "." + indexCarbonTable.getTableName)
-            }
-          } finally {
-            if (statusLock != null) {
-              statusLock.unlock()
-            }
-          }
-          // clear the indexSchema cache for the merged segments, as the index files and
-          // data files are rewritten after compaction
+          val segmentToLoadStartTimeMap: scala.collection.mutable.Map[String, java.lang.Long] =
+            scala.collection.mutable.Map()
           if (mergedSegments.size > 0) {

Review comment:
       this if is not required




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#discussion_r534676638



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -162,15 +162,25 @@ public static void writeSegmentFile(String tablePath, String segmentId, String t
    * corresponding partitions.
    */
   public static void writeSegmentFile(String tablePath, final String taskNo, String location,
-      String timeStamp, List<String> partitionNames, boolean isMergeIndexFlow) throws IOException {
-    String tempFolderLoc = timeStamp + ".tmp";
-    String writePath = CarbonTablePath.getSegmentFilesLocation(tablePath) + "/" + tempFolderLoc;
+      String timeStamp, List<String> partitionNames, boolean isMergeIndexFlow,
+      boolean readFileFooterFromCarbonDataFile) throws IOException {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#discussion_r534676791



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -162,15 +162,25 @@ public static void writeSegmentFile(String tablePath, String segmentId, String t
    * corresponding partitions.
    */
   public static void writeSegmentFile(String tablePath, final String taskNo, String location,
-      String timeStamp, List<String> partitionNames, boolean isMergeIndexFlow) throws IOException {
-    String tempFolderLoc = timeStamp + ".tmp";
-    String writePath = CarbonTablePath.getSegmentFilesLocation(tablePath) + "/" + tempFolderLoc;
+      String timeStamp, List<String> partitionNames, boolean isMergeIndexFlow,
+      boolean readFileFooterFromCarbonDataFile) throws IOException {
+    String tempFolderLoc;
+    String writePath;
+    if (!readFileFooterFromCarbonDataFile) {
+      tempFolderLoc = timeStamp + ".tmp";
+      writePath = CarbonTablePath.getSegmentFilesLocation(tablePath) + "/" + tempFolderLoc;
+    } else {
+      // If Alter merge index for old tables is triggered,
+      // directly write mergeindex file into segment file location
+      tempFolderLoc = location;
+      writePath = CarbonTablePath.getSegmentFilesLocation(tablePath);
+    }
     CarbonFile carbonFile = FileFactory.getCarbonFile(writePath);
     if (!carbonFile.exists()) {
       carbonFile.mkdirs();
     }
     CarbonFile tempFolder;
-    if (isMergeIndexFlow) {
+    if (isMergeIndexFlow || readFileFooterFromCarbonDataFile) {

Review comment:
       removed unnecessary code from this method.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#discussion_r534676835



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -198,20 +208,17 @@ public boolean accept(CarbonFile file) {
         folderDetails.setRelative(isRelative);
         folderDetails.setPartitions(partitionNames);
         folderDetails.setStatus(SegmentStatus.SUCCESS.getMessage());
-        for (CarbonFile file : carbonFiles) {
-          if (file.getName().endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)) {
-            folderDetails.setMergeFileName(file.getName());
-          } else {
-            folderDetails.getFiles().add(file.getName());
-          }
-        }
+        setIndexFileNamesToFolderDetails(folderDetails, carbonFiles);
         segmentFile.addPath(location, folderDetails);
         String path = null;
         if (isMergeIndexFlow) {
           // in case of merge index flow, tasks are launched per partition and all the tasks
           // will be written to the same tmp folder, in that case taskNo is not unique.
           // To generate a unique fileName UUID is used
           path = writePath + "/" + CarbonUtil.generateUUID() + CarbonTablePath.SEGMENT_EXT;
+          if (readFileFooterFromCarbonDataFile) {
+            path = writePath + "/" + timeStamp + CarbonTablePath.SEGMENT_EXT;

Review comment:
       removed unnecessary code from this method.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#discussion_r534677017



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -439,6 +430,73 @@ public boolean accept(CarbonFile file) {
     return null;
   }
 
+  /**

Review comment:
       For SI, the flow is: in progress - merge index MT  - call load SI - merge index SI - writesegmentFile for SI - success SI  - writesegmentFile for MT - success MT.
   Here, SI is reading index files from segment path as MT segment file is not yet written.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#discussion_r534677179



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -664,7 +723,8 @@ public static SegmentFile getSegmentFileForPhysicalDataPartitions(String tablePa
       CarbonFile[] listFiles = carbonFile.listFiles(new CarbonFileFilter() {
         @Override
         public boolean accept(CarbonFile file) {
-          return CarbonTablePath.isCarbonIndexFile(file.getAbsolutePath());
+          return file.getName().contains(uuid) && CarbonTablePath

Review comment:
       removed this part, not required.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


123456 ... 9