GitHub user jackylk opened a pull request:
https://github.com/apache/carbondata/pull/1227 Overwrite When user issued insert overwrite command, it should delete the file immediately to avoid stale folders. This PR implements this behavior. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata overwrite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1227.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1227 ---- commit 72d63d7aec0fac8a86c375e70d20ac64ef26c9e7 Author: Jacky Li <[hidden email]> Date: 2017-08-03T06:17:15Z wip commit 9def5d906b4cc55060959c1fc23821414b7d5356 Author: Jacky Li <[hidden email]> Date: 2017-08-03T06:37:39Z delete file when insert overwrite ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1227 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3339/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1227 SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/82/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1227 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3340/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1227#discussion_r131097020 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -907,25 +906,20 @@ object CarbonDataRDDFactory { throw new Exception("No Data to load") } val metadataDetails = status(0)._2._1 - if (!isAgg) { --- End diff -- isAgg is always false --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1227 SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/83/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1227 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/742/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1227#discussion_r131099457 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -907,25 +906,20 @@ object CarbonDataRDDFactory { throw new Exception("No Data to load") } val metadataDetails = status(0)._2._1 - if (!isAgg) { --- End diff -- yes, it is not required now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1227 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3343/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1227#discussion_r131137501 --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java --- @@ -312,18 +313,28 @@ public static boolean recordLoadMetadata(LoadMetadataDetails newMetaEntry, } if (insertOverwrite) { for (LoadMetadataDetails entry : listOfLoadFolderDetails) { - entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE); + if (!entry.getLoadStatus().equals(LoadStatusType.INSERT_OVERWRITE.getMessage())) { + entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE); + // For insert overwrite, we will delete the old segment folder immediately + // So collect the old segments here + String path = carbonTablePath.getCarbonDataDirectoryPath("0", entry.getLoadName()); + staleFolders.add(FileFactory.getCarbonFile(path)); --- End diff -- as another comment mentioned... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1227#discussion_r131137046 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -713,16 +714,20 @@ object CommonUtil { SegmentStatusManager.readLoadMetadata( carbonTablePath.getMetadataDirectoryPath) var loadInprogressExist = false + val staleFolders: Seq[CarbonFile] = Seq() listOfLoadFolderDetailsArray.foreach { load => if (load.getLoadStatus.equals(LoadStatusType.IN_PROGRESS.getMessage) || load.getLoadStatus.equals(LoadStatusType.INSERT_OVERWRITE.getMessage)) { load.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE) + staleFolders :+ FileFactory.getCarbonFile( + carbonTablePath.getCarbonDataDirectoryPath("0", load.getLoadName)) --- End diff -- why not delete the stale folder right now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1227#discussion_r131138030 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java --- @@ -87,6 +87,10 @@ else if (path.startsWith(CarbonCommonConstants.VIEWFSURL_PREFIX)) { return FileType.LOCAL; } + public static CarbonFile getCarbonFile(String path) { --- End diff -- it seems we can simplify many methods by removing FileType arg --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1227#discussion_r131152011 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java --- @@ -87,6 +87,10 @@ else if (path.startsWith(CarbonCommonConstants.VIEWFSURL_PREFIX)) { return FileType.LOCAL; } + public static CarbonFile getCarbonFile(String path) { --- End diff -- Yes, I think this can be done in another PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1227 @jackylk it seems some tests are failing in 1.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1227#discussion_r131153070 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -713,16 +714,20 @@ object CommonUtil { SegmentStatusManager.readLoadMetadata( carbonTablePath.getMetadataDirectoryPath) var loadInprogressExist = false + val staleFolders: Seq[CarbonFile] = Seq() listOfLoadFolderDetailsArray.foreach { load => if (load.getLoadStatus.equals(LoadStatusType.IN_PROGRESS.getMessage) || load.getLoadStatus.equals(LoadStatusType.INSERT_OVERWRITE.getMessage)) { load.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE) + staleFolders :+ FileFactory.getCarbonFile( + carbonTablePath.getCarbonDataDirectoryPath("0", load.getLoadName)) --- End diff -- It is saver to delete after writing the new table status file (line 729), if we delete the folder here, it could result in inconsistent state if system crash between line 723 and line 729 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1227#discussion_r131153192 --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java --- @@ -312,18 +313,28 @@ public static boolean recordLoadMetadata(LoadMetadataDetails newMetaEntry, } if (insertOverwrite) { for (LoadMetadataDetails entry : listOfLoadFolderDetails) { - entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE); + if (!entry.getLoadStatus().equals(LoadStatusType.INSERT_OVERWRITE.getMessage())) { + entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE); + // For insert overwrite, we will delete the old segment folder immediately + // So collect the old segments here + String path = carbonTablePath.getCarbonDataDirectoryPath("0", entry.getLoadName()); + staleFolders.add(FileFactory.getCarbonFile(path)); --- End diff -- The same reason --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1227#discussion_r131153248 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -713,16 +714,20 @@ object CommonUtil { SegmentStatusManager.readLoadMetadata( carbonTablePath.getMetadataDirectoryPath) var loadInprogressExist = false + val staleFolders: Seq[CarbonFile] = Seq() listOfLoadFolderDetailsArray.foreach { load => if (load.getLoadStatus.equals(LoadStatusType.IN_PROGRESS.getMessage) || load.getLoadStatus.equals(LoadStatusType.INSERT_OVERWRITE.getMessage)) { load.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE) + staleFolders :+ FileFactory.getCarbonFile( + carbonTablePath.getCarbonDataDirectoryPath("0", load.getLoadName)) --- End diff -- Because it is always safe to update the status file first before deleting anything. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1227#discussion_r131154027 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java --- @@ -87,6 +87,10 @@ else if (path.startsWith(CarbonCommonConstants.VIEWFSURL_PREFIX)) { return FileType.LOCAL; } + public static CarbonFile getCarbonFile(String path) { --- End diff -- I don't think it is better to change the existing caller methods, going forward user can use this if he needs it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1227 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3348/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1227 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/750/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |