[GitHub] carbondata pull request #1227: Overwrite

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: Overwrite

qiuchenjian-2
GitHub user jackylk opened a pull request:

    https://github.com/apache/carbondata/pull/1227

    Overwrite

    When user issued insert overwrite command, it should delete the file immediately to avoid stale folders.
    This PR implements this behavior.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jackylk/incubator-carbondata overwrite

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1227.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1227
   
----
commit 72d63d7aec0fac8a86c375e70d20ac64ef26c9e7
Author: Jacky Li <[hidden email]>
Date:   2017-08-03T06:17:15Z

    wip

commit 9def5d906b4cc55060959c1fc23821414b7d5356
Author: Jacky Li <[hidden email]>
Date:   2017-08-03T06:37:39Z

    delete file when insert overwrite

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1227: Overwrite

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3339/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
 
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/82/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3340/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131097020
 
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -907,25 +906,20 @@ object CarbonDataRDDFactory {
               throw new Exception("No Data to load")
             }
             val metadataDetails = status(0)._2._1
    -        if (!isAgg) {
    --- End diff --
   
    isAgg is always false


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
 
    SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/83/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
 
    Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/742/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131099457
 
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -907,25 +906,20 @@ object CarbonDataRDDFactory {
               throw new Exception("No Data to load")
             }
             val metadataDetails = status(0)._2._1
    -        if (!isAgg) {
    --- End diff --
   
    yes, it is not required now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3343/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131137501
 
    --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java ---
    @@ -312,18 +313,28 @@ public static boolean recordLoadMetadata(LoadMetadataDetails newMetaEntry,
               }
               if (insertOverwrite) {
                 for (LoadMetadataDetails entry : listOfLoadFolderDetails) {
    -              entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +              if (!entry.getLoadStatus().equals(LoadStatusType.INSERT_OVERWRITE.getMessage())) {
    +                entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +                // For insert overwrite, we will delete the old segment folder immediately
    +                // So collect the old segments here
    +                String path = carbonTablePath.getCarbonDataDirectoryPath("0", entry.getLoadName());
    +                staleFolders.add(FileFactory.getCarbonFile(path));
    --- End diff --
   
    as another comment mentioned...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131137046
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -713,16 +714,20 @@ object CommonUtil {
                             SegmentStatusManager.readLoadMetadata(
                               carbonTablePath.getMetadataDirectoryPath)
                           var loadInprogressExist = false
    +                      val staleFolders: Seq[CarbonFile] = Seq()
                           listOfLoadFolderDetailsArray.foreach { load =>
                             if (load.getLoadStatus.equals(LoadStatusType.IN_PROGRESS.getMessage) ||
                                 load.getLoadStatus.equals(LoadStatusType.INSERT_OVERWRITE.getMessage)) {
                               load.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE)
    +                          staleFolders :+ FileFactory.getCarbonFile(
    +                            carbonTablePath.getCarbonDataDirectoryPath("0", load.getLoadName))
    --- End diff --
   
    why not delete the stale folder right now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131138030
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java ---
    @@ -87,6 +87,10 @@ else if (path.startsWith(CarbonCommonConstants.VIEWFSURL_PREFIX)) {
         return FileType.LOCAL;
       }
     
    +  public static CarbonFile getCarbonFile(String path) {
    --- End diff --
   
    it seems we can simplify many methods by removing FileType arg


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131152011
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java ---
    @@ -87,6 +87,10 @@ else if (path.startsWith(CarbonCommonConstants.VIEWFSURL_PREFIX)) {
         return FileType.LOCAL;
       }
     
    +  public static CarbonFile getCarbonFile(String path) {
    --- End diff --
   
    Yes, I think this can be done in another PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1227
 
    @jackylk it seems some tests are failing in 1.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131153070
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -713,16 +714,20 @@ object CommonUtil {
                             SegmentStatusManager.readLoadMetadata(
                               carbonTablePath.getMetadataDirectoryPath)
                           var loadInprogressExist = false
    +                      val staleFolders: Seq[CarbonFile] = Seq()
                           listOfLoadFolderDetailsArray.foreach { load =>
                             if (load.getLoadStatus.equals(LoadStatusType.IN_PROGRESS.getMessage) ||
                                 load.getLoadStatus.equals(LoadStatusType.INSERT_OVERWRITE.getMessage)) {
                               load.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE)
    +                          staleFolders :+ FileFactory.getCarbonFile(
    +                            carbonTablePath.getCarbonDataDirectoryPath("0", load.getLoadName))
    --- End diff --
   
    It is saver to delete after writing the new table status file (line 729), if we delete the folder here, it could result in inconsistent state if system crash between line 723 and line 729


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131153192
 
    --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java ---
    @@ -312,18 +313,28 @@ public static boolean recordLoadMetadata(LoadMetadataDetails newMetaEntry,
               }
               if (insertOverwrite) {
                 for (LoadMetadataDetails entry : listOfLoadFolderDetails) {
    -              entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +              if (!entry.getLoadStatus().equals(LoadStatusType.INSERT_OVERWRITE.getMessage())) {
    +                entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +                // For insert overwrite, we will delete the old segment folder immediately
    +                // So collect the old segments here
    +                String path = carbonTablePath.getCarbonDataDirectoryPath("0", entry.getLoadName());
    +                staleFolders.add(FileFactory.getCarbonFile(path));
    --- End diff --
   
    The same reason


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131153248
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -713,16 +714,20 @@ object CommonUtil {
                             SegmentStatusManager.readLoadMetadata(
                               carbonTablePath.getMetadataDirectoryPath)
                           var loadInprogressExist = false
    +                      val staleFolders: Seq[CarbonFile] = Seq()
                           listOfLoadFolderDetailsArray.foreach { load =>
                             if (load.getLoadStatus.equals(LoadStatusType.IN_PROGRESS.getMessage) ||
                                 load.getLoadStatus.equals(LoadStatusType.INSERT_OVERWRITE.getMessage)) {
                               load.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE)
    +                          staleFolders :+ FileFactory.getCarbonFile(
    +                            carbonTablePath.getCarbonDataDirectoryPath("0", load.getLoadName))
    --- End diff --
   
    Because it is always safe to update the status file first before deleting anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131154027
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java ---
    @@ -87,6 +87,10 @@ else if (path.startsWith(CarbonCommonConstants.VIEWFSURL_PREFIX)) {
         return FileType.LOCAL;
       }
     
    +  public static CarbonFile getCarbonFile(String path) {
    --- End diff --
   
    I don't think it is better to change the existing caller methods, going forward user can use this if he needs it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1227
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3348/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1227
 
    Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/750/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
12