VenuReddy2103 opened a new pull request #3776: URL: https://github.com/apache/carbondata/pull/3776 ### Why is this PR needed? Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.merge.index.in.segment' property is set to false. And actual index files which were present in respective partition's '.tmp' directory are also deleted without moving them out to respective partition directory where its '.carbondata' file exist. All the queries fail due to this problem as there is no segment and index files. This issue was introduced from the resolution of an older optimization PR #3535 ### What changes were proposed in this PR? If 'carbon.merge.index.in.segment' property is false, we can create the segment directory and segment file, and move the index file from respective partition's temp directory to partition directory where the .carbondata file exists. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA1 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-634955639 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3078/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-634956082 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1357/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-637158541 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3120/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-637158658 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1396/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-637376228 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3121/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-637376476 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1397/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-637646868 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1400/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-637648330 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3124/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r434305132 ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, commitJobFinal(context, loadModel, operationContext, carbonTable, uniqueId); } + @SuppressWarnings("unchecked") + private void writeSegmentWithoutMergeIndex(JobContext context, CarbonLoadModel loadModel, + String segmentFileName, String partitionPath) throws IOException { + Map<String, String> IndexFileNameMap = (Map<String, String>) ObjectSerializationUtil Review comment: ```suggestion Map<String, String> indexFileNameMap = (Map<String, String>) ObjectSerializationUtil ``` ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableLoadingTestCase.scala ########## @@ -566,6 +567,18 @@ class StandardPartitionTableLoadingTestCase extends QueryTest with BeforeAndAfte assert(ex.getMessage().equalsIgnoreCase("Cannot use all columns for partition columns;")) } + test("test partition without merge index files for segment") { + sql("DROP TABLE IF EXISTS new_par") + CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT, "false") + sql("CREATE TABLE new_par (a INT, b INT) PARTITIONED BY (country STRING) STORED AS carbondata") + sql("INSERT INTO new_par PARTITION(country='India') SELECT 1,2") + sql("INSERT INTO new_par PARTITION(country='India') SELECT 3,4") + sql("INSERT INTO new_par PARTITION(country='China') SELECT 5,6") + sql("INSERT INTO new_par PARTITION(country='China') SELECT 7,8") + checkAnswer(sql("SELECT COUNT(*) FROM new_par"), Seq(Row(4))) Review comment: please check for index files also ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, commitJobFinal(context, loadModel, operationContext, carbonTable, uniqueId); } + @SuppressWarnings("unchecked") + private void writeSegmentWithoutMergeIndex(JobContext context, CarbonLoadModel loadModel, Review comment: Please add method description, as why this is required ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, commitJobFinal(context, loadModel, operationContext, carbonTable, uniqueId); } + @SuppressWarnings("unchecked") + private void writeSegmentWithoutMergeIndex(JobContext context, CarbonLoadModel loadModel, + String segmentFileName, String partitionPath) throws IOException { + Map<String, String> IndexFileNameMap = (Map<String, String>) ObjectSerializationUtil + .convertStringToObject(context.getConfiguration().get("carbon.index.files.name")); + List<String> partitionList = + (List<String>) ObjectSerializationUtil.convertStringToObject(partitionPath); + SegmentFileStore.SegmentFile finalSegmentFile = null; + boolean isRelativePath; + String path; + for (String partition : partitionList) { + isRelativePath = false; + path = partition; + if (path.startsWith(loadModel.getTablePath())) { + path = path.substring(loadModel.getTablePath().length()); + isRelativePath = true; + } + SegmentFileStore.SegmentFile segmentFile = new SegmentFileStore.SegmentFile(); + SegmentFileStore.FolderDetails folderDetails = new SegmentFileStore.FolderDetails(); + Set<String> set = new HashSet<String>(); Review comment: Rename `set` variable ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, commitJobFinal(context, loadModel, operationContext, carbonTable, uniqueId); } + @SuppressWarnings("unchecked") + private void writeSegmentWithoutMergeIndex(JobContext context, CarbonLoadModel loadModel, + String segmentFileName, String partitionPath) throws IOException { + Map<String, String> IndexFileNameMap = (Map<String, String>) ObjectSerializationUtil + .convertStringToObject(context.getConfiguration().get("carbon.index.files.name")); + List<String> partitionList = + (List<String>) ObjectSerializationUtil.convertStringToObject(partitionPath); + SegmentFileStore.SegmentFile finalSegmentFile = null; + boolean isRelativePath; + String path; + for (String partition : partitionList) { + isRelativePath = false; + path = partition; + if (path.startsWith(loadModel.getTablePath())) { + path = path.substring(loadModel.getTablePath().length()); + isRelativePath = true; + } + SegmentFileStore.SegmentFile segmentFile = new SegmentFileStore.SegmentFile(); + SegmentFileStore.FolderDetails folderDetails = new SegmentFileStore.FolderDetails(); + Set<String> set = new HashSet<String>(); Review comment: ```suggestion Set<String> set = Collections.singleton(IndexFileNameMap.get(partition)); ``` and remove next line ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, commitJobFinal(context, loadModel, operationContext, carbonTable, uniqueId); } + @SuppressWarnings("unchecked") + private void writeSegmentWithoutMergeIndex(JobContext context, CarbonLoadModel loadModel, + String segmentFileName, String partitionPath) throws IOException { + Map<String, String> IndexFileNameMap = (Map<String, String>) ObjectSerializationUtil + .convertStringToObject(context.getConfiguration().get("carbon.index.files.name")); + List<String> partitionList = + (List<String>) ObjectSerializationUtil.convertStringToObject(partitionPath); + SegmentFileStore.SegmentFile finalSegmentFile = null; + boolean isRelativePath; + String path; Review comment: Can rename to partitionLoc ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -266,7 +270,16 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, && operationContext != null) { uuid = operationContext.getProperty("uuid").toString(); } + String segmentFileName = SegmentFileStore.genSegmentFileName( + loadModel.getSegmentId(), String.valueOf(loadModel.getFactTimeStamp())); + boolean isMergeIndex = Boolean.parseBoolean(CarbonProperties.getInstance() Review comment: Variable `isMergeIndexEnabled` is already defined in caller method Line No:192. Please move the code before calling `commitJobForPartition` and reuse ########## File path: core/src/main/java/org/apache/carbondata/core/preagg/TimeSeriesUDF.java ########## @@ -176,9 +176,9 @@ private void initialize() { } catch (IllegalArgumentException ex) { LOGGER.warn("Invalid value set for first of the week. Considering the default value as: " + CarbonCommonConstants.CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT); - firstDayOfWeek = DaysOfWeekEnum.valueOf(CarbonProperties.getInstance() - .getProperty(CarbonCommonConstants.CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT) - .toUpperCase()).getOrdinal(); + firstDayOfWeek = Review comment: please add why this is modified in PR description ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, commitJobFinal(context, loadModel, operationContext, carbonTable, uniqueId); } + @SuppressWarnings("unchecked") + private void writeSegmentWithoutMergeIndex(JobContext context, CarbonLoadModel loadModel, + String segmentFileName, String partitionPath) throws IOException { + Map<String, String> IndexFileNameMap = (Map<String, String>) ObjectSerializationUtil + .convertStringToObject(context.getConfiguration().get("carbon.index.files.name")); + List<String> partitionList = + (List<String>) ObjectSerializationUtil.convertStringToObject(partitionPath); + SegmentFileStore.SegmentFile finalSegmentFile = null; + boolean isRelativePath; + String path; + for (String partition : partitionList) { + isRelativePath = false; + path = partition; + if (path.startsWith(loadModel.getTablePath())) { + path = path.substring(loadModel.getTablePath().length()); + isRelativePath = true; + } + SegmentFileStore.SegmentFile segmentFile = new SegmentFileStore.SegmentFile(); + SegmentFileStore.FolderDetails folderDetails = new SegmentFileStore.FolderDetails(); + Set<String> set = new HashSet<String>(); + set.add(IndexFileNameMap.get(partition)); + folderDetails.setFiles(set); + List<String> partitions = new ArrayList<String>(); Review comment: ```suggestion List<String> partitions = Collections.singletonList(path.substring(path.indexOf("/") + 1));``` and remove next line ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-640620137 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r436633946 ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, commitJobFinal(context, loadModel, operationContext, carbonTable, uniqueId); } + @SuppressWarnings("unchecked") + private void writeSegmentWithoutMergeIndex(JobContext context, CarbonLoadModel loadModel, + String segmentFileName, String partitionPath) throws IOException { + Map<String, String> IndexFileNameMap = (Map<String, String>) ObjectSerializationUtil Review comment: Modified ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, commitJobFinal(context, loadModel, operationContext, carbonTable, uniqueId); } + @SuppressWarnings("unchecked") + private void writeSegmentWithoutMergeIndex(JobContext context, CarbonLoadModel loadModel, + String segmentFileName, String partitionPath) throws IOException { + Map<String, String> IndexFileNameMap = (Map<String, String>) ObjectSerializationUtil + .convertStringToObject(context.getConfiguration().get("carbon.index.files.name")); + List<String> partitionList = + (List<String>) ObjectSerializationUtil.convertStringToObject(partitionPath); + SegmentFileStore.SegmentFile finalSegmentFile = null; + boolean isRelativePath; + String path; + for (String partition : partitionList) { + isRelativePath = false; + path = partition; + if (path.startsWith(loadModel.getTablePath())) { + path = path.substring(loadModel.getTablePath().length()); + isRelativePath = true; + } + SegmentFileStore.SegmentFile segmentFile = new SegmentFileStore.SegmentFile(); + SegmentFileStore.FolderDetails folderDetails = new SegmentFileStore.FolderDetails(); + Set<String> set = new HashSet<String>(); Review comment: Removed this variable itself now. Directly setting it in folder details. `folderDetails.setFiles(Collections.singleton(indexFileNameMap.get(partition)));` ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, commitJobFinal(context, loadModel, operationContext, carbonTable, uniqueId); } + @SuppressWarnings("unchecked") + private void writeSegmentWithoutMergeIndex(JobContext context, CarbonLoadModel loadModel, + String segmentFileName, String partitionPath) throws IOException { + Map<String, String> IndexFileNameMap = (Map<String, String>) ObjectSerializationUtil + .convertStringToObject(context.getConfiguration().get("carbon.index.files.name")); + List<String> partitionList = + (List<String>) ObjectSerializationUtil.convertStringToObject(partitionPath); + SegmentFileStore.SegmentFile finalSegmentFile = null; + boolean isRelativePath; + String path; + for (String partition : partitionList) { + isRelativePath = false; + path = partition; + if (path.startsWith(loadModel.getTablePath())) { + path = path.substring(loadModel.getTablePath().length()); + isRelativePath = true; + } + SegmentFileStore.SegmentFile segmentFile = new SegmentFileStore.SegmentFile(); + SegmentFileStore.FolderDetails folderDetails = new SegmentFileStore.FolderDetails(); + Set<String> set = new HashSet<String>(); Review comment: Modified ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, commitJobFinal(context, loadModel, operationContext, carbonTable, uniqueId); } + @SuppressWarnings("unchecked") + private void writeSegmentWithoutMergeIndex(JobContext context, CarbonLoadModel loadModel, + String segmentFileName, String partitionPath) throws IOException { + Map<String, String> IndexFileNameMap = (Map<String, String>) ObjectSerializationUtil + .convertStringToObject(context.getConfiguration().get("carbon.index.files.name")); + List<String> partitionList = + (List<String>) ObjectSerializationUtil.convertStringToObject(partitionPath); + SegmentFileStore.SegmentFile finalSegmentFile = null; + boolean isRelativePath; + String path; + for (String partition : partitionList) { + isRelativePath = false; + path = partition; + if (path.startsWith(loadModel.getTablePath())) { + path = path.substring(loadModel.getTablePath().length()); + isRelativePath = true; + } + SegmentFileStore.SegmentFile segmentFile = new SegmentFileStore.SegmentFile(); + SegmentFileStore.FolderDetails folderDetails = new SegmentFileStore.FolderDetails(); + Set<String> set = new HashSet<String>(); + set.add(IndexFileNameMap.get(partition)); + folderDetails.setFiles(set); + List<String> partitions = new ArrayList<String>(); Review comment: Modified ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, commitJobFinal(context, loadModel, operationContext, carbonTable, uniqueId); } + @SuppressWarnings("unchecked") + private void writeSegmentWithoutMergeIndex(JobContext context, CarbonLoadModel loadModel, + String segmentFileName, String partitionPath) throws IOException { + Map<String, String> IndexFileNameMap = (Map<String, String>) ObjectSerializationUtil + .convertStringToObject(context.getConfiguration().get("carbon.index.files.name")); + List<String> partitionList = + (List<String>) ObjectSerializationUtil.convertStringToObject(partitionPath); + SegmentFileStore.SegmentFile finalSegmentFile = null; + boolean isRelativePath; + String path; Review comment: Modified ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, commitJobFinal(context, loadModel, operationContext, carbonTable, uniqueId); } + @SuppressWarnings("unchecked") + private void writeSegmentWithoutMergeIndex(JobContext context, CarbonLoadModel loadModel, Review comment: Added header for this private method ########## File path: core/src/main/java/org/apache/carbondata/core/preagg/TimeSeriesUDF.java ########## @@ -176,9 +176,9 @@ private void initialize() { } catch (IllegalArgumentException ex) { LOGGER.warn("Invalid value set for first of the week. Considering the default value as: " + CarbonCommonConstants.CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT); - firstDayOfWeek = DaysOfWeekEnum.valueOf(CarbonProperties.getInstance() - .getProperty(CarbonCommonConstants.CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT) - .toUpperCase()).getOrdinal(); + firstDayOfWeek = Review comment: Added this info to PR description ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableLoadingTestCase.scala ########## @@ -566,6 +567,18 @@ class StandardPartitionTableLoadingTestCase extends QueryTest with BeforeAndAfte assert(ex.getMessage().equalsIgnoreCase("Cannot use all columns for partition columns;")) } + test("test partition without merge index files for segment") { + sql("DROP TABLE IF EXISTS new_par") + CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT, "false") + sql("CREATE TABLE new_par (a INT, b INT) PARTITIONED BY (country STRING) STORED AS carbondata") + sql("INSERT INTO new_par PARTITION(country='India') SELECT 1,2") + sql("INSERT INTO new_par PARTITION(country='India') SELECT 3,4") + sql("INSERT INTO new_par PARTITION(country='China') SELECT 5,6") + sql("INSERT INTO new_par PARTITION(country='China') SELECT 7,8") + checkAnswer(sql("SELECT COUNT(*) FROM new_par"), Seq(Row(4))) Review comment: I think, it is redundant to do that. If segment files are not present or index files are not present, this query check will fail. ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -266,7 +270,16 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, && operationContext != null) { uuid = operationContext.getProperty("uuid").toString(); } + String segmentFileName = SegmentFileStore.genSegmentFileName( + loadModel.getSegmentId(), String.valueOf(loadModel.getFactTimeStamp())); + boolean isMergeIndex = Boolean.parseBoolean(CarbonProperties.getInstance() Review comment: This issue is applicable only when table is partitioned. I think, it is appropriate to do inside the `commitJobForPartition()` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-643917457 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-643972110 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1426/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-643972857 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3150/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-646444496 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r447425474 ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -282,10 +296,12 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, throw new IOException(e); } } - String segmentFileName = SegmentFileStore.genSegmentFileName( - loadModel.getSegmentId(), String.valueOf(loadModel.getFactTimeStamp())); newMetaEntry.setSegmentFile(segmentFileName + CarbonTablePath.SEGMENT_EXT); - newMetaEntry.setIndexSize("" + loadModel.getMetrics().getMergeIndexSize()); + if (isMergeIndex) { Review comment: can you move it in the else case of same method line 280, scattered checks are hard to read. It is better to keep together ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r447426104 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableLoadingTestCase.scala ########## @@ -640,6 +653,10 @@ class StandardPartitionTableLoadingTestCase extends QueryTest with BeforeAndAfte } } + override def afterEach(): Unit = { + CarbonProperties.getInstance() Review comment: no need for afterEach, in the same testcase where you are setting. you can reset ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r447427801 ########## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ########## @@ -302,6 +318,61 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, commitJobFinal(context, loadModel, operationContext, carbonTable, uniqueId); } + /** + * Method to create and write the segment file, removes the temporary directories from all the + * respective partition directories. This method is invoked only when {@link + * CarbonCommonConstants#CARBON_MERGE_INDEX_IN_SEGMENT} is disabled. + * @param context Job context + * @param loadModel Load model + * @param segmentFileName Segment file name to write + * @param partitionPath Serialized list of partition location + * @throws IOException + */ + @SuppressWarnings("unchecked") + private void writeSegmentWithoutMergeIndex(JobContext context, CarbonLoadModel loadModel, + String segmentFileName, String partitionPath) throws IOException { + Map<String, String> indexFileNameMap = (Map<String, String>) ObjectSerializationUtil + .convertStringToObject(context.getConfiguration().get("carbon.index.files.name")); + List<String> partitionList = + (List<String>) ObjectSerializationUtil.convertStringToObject(partitionPath); + SegmentFileStore.SegmentFile finalSegmentFile = null; + boolean isRelativePath; + String partitionLoc; + for (String partition : partitionList) { + isRelativePath = false; + partitionLoc = partition; + if (partitionLoc.startsWith(loadModel.getTablePath())) { + partitionLoc = partitionLoc.substring(loadModel.getTablePath().length()); + isRelativePath = true; + } + SegmentFileStore.SegmentFile segmentFile = new SegmentFileStore.SegmentFile(); + SegmentFileStore.FolderDetails folderDetails = new SegmentFileStore.FolderDetails(); + folderDetails.setFiles(Collections.singleton(indexFileNameMap.get(partition))); + folderDetails.setPartitions( + Collections.singletonList(partitionLoc.substring(partitionLoc.indexOf("/") + 1))); + folderDetails.setRelative(isRelativePath); + folderDetails.setStatus(SegmentStatus.SUCCESS.getMessage()); + segmentFile.getLocationMap().put(partitionLoc, folderDetails); + if (finalSegmentFile != null) { Review comment: finalSegmentFile will always be null currently ? some code handling is missing or need to remove it ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r447447448 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala ########## @@ -253,6 +255,14 @@ case class CarbonSQLHadoopMapReduceCommitProtocol(jobId: String, path: String, i if (size.isDefined) { dataSize = dataSize + java.lang.Long.parseLong(size.get) } + val indexSize = map.get("carbon.indexsize") + if (indexSize.isDefined) { + indexLen = indexLen + java.lang.Long.parseLong(indexSize.get) + } + val indexFiles = map.get("carbon.index.files.name") + if (indexFiles.isDefined) { + indexFilesName = indexFiles.get Review comment: you are suppose to append it ? I think now it is keeping only last task index file name, not all the task index file name. Why this is missed in testing ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |