[GitHub] carbondata pull request #1227: Overwrite

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1227
 
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/88/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131294374
 
    --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java ---
    @@ -312,18 +313,28 @@ public static boolean recordLoadMetadata(LoadMetadataDetails newMetaEntry,
               }
               if (insertOverwrite) {
                 for (LoadMetadataDetails entry : listOfLoadFolderDetails) {
    -              entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +              if (!entry.getLoadStatus().equals(LoadStatusType.INSERT_OVERWRITE.getMessage())) {
    +                entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +                // For insert overwrite, we will delete the old segment folder immediately
    +                // So collect the old segments here
    +                String path = carbonTablePath.getCarbonDataDirectoryPath("0", entry.getLoadName());
    +                staleFolders.add(FileFactory.getCarbonFile(path));
    +              }
                 }
               }
               listOfLoadFolderDetails.set(indexToOverwriteNewMetaEntry, newMetaEntry);
             }
             SegmentStatusManager.writeLoadDetailsIntoFile(tableStatusPath, listOfLoadFolderDetails
                 .toArray(new LoadMetadataDetails[listOfLoadFolderDetails.size()]));
    +        // Delete all old stale segment folders
    +        for (CarbonFile staleFolder : staleFolders) {
    +          CarbonUtil.deleteFoldersAndFiles(staleFolder);
    +        }
    --- End diff --
   
    but it seems that we delete the folder without checking the operation status here,different from the behavior in another changes. Am I missing something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131294865
 
    --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java ---
    @@ -312,18 +313,28 @@ public static boolean recordLoadMetadata(LoadMetadataDetails newMetaEntry,
               }
               if (insertOverwrite) {
                 for (LoadMetadataDetails entry : listOfLoadFolderDetails) {
    -              entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +              if (!entry.getLoadStatus().equals(LoadStatusType.INSERT_OVERWRITE.getMessage())) {
    +                entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +                // For insert overwrite, we will delete the old segment folder immediately
    +                // So collect the old segments here
    +                String path = carbonTablePath.getCarbonDataDirectoryPath("0", entry.getLoadName());
    +                staleFolders.add(FileFactory.getCarbonFile(path));
    +              }
                 }
               }
               listOfLoadFolderDetails.set(indexToOverwriteNewMetaEntry, newMetaEntry);
             }
             SegmentStatusManager.writeLoadDetailsIntoFile(tableStatusPath, listOfLoadFolderDetails
                 .toArray(new LoadMetadataDetails[listOfLoadFolderDetails.size()]));
    +        // Delete all old stale segment folders
    +        for (CarbonFile staleFolder : staleFolders) {
    +          CarbonUtil.deleteFoldersAndFiles(staleFolder);
    +        }
    --- End diff --
   
    yeah,I get it~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131295031
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -713,16 +714,20 @@ object CommonUtil {
                             SegmentStatusManager.readLoadMetadata(
                               carbonTablePath.getMetadataDirectoryPath)
                           var loadInprogressExist = false
    +                      val staleFolders: Seq[CarbonFile] = Seq()
                           listOfLoadFolderDetailsArray.foreach { load =>
                             if (load.getLoadStatus.equals(LoadStatusType.IN_PROGRESS.getMessage) ||
                                 load.getLoadStatus.equals(LoadStatusType.INSERT_OVERWRITE.getMessage)) {
                               load.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE)
    +                          staleFolders :+ FileFactory.getCarbonFile(
    +                            carbonTablePath.getCarbonDataDirectoryPath("0", load.getLoadName))
    --- End diff --
   
    yeah,I mistakenly thought it was `if (loadInprogressExist)` that matters


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131314408
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/InsertIntoCarbonTableTestCase.scala ---
    @@ -252,13 +231,12 @@ class InsertIntoCarbonTableTestCase extends QueryTest with BeforeAndAfterAll {
         sql("insert overwrite table CarbonOverwrite select * from THive")
         sql("insert overwrite table HiveOverwrite select * from THive")
         checkAnswer(sql("select count(*) from CarbonOverwrite"), sql("select count(*) from HiveOverwrite"))
    -    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, timeStampPropOrig)
    +    val folder = new File(s"$storeLocation/default/CarbonOverwrite/Fact/Part0/")
    --- End diff --
   
    CarbonOverwrite is always writes in lower case as folder, so use as below to avoid test failure,
    ```
    val folder = new File(s"$storeLocation/default/carbonoverwrite/Fact/Part0/")
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131314507
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/InsertIntoCarbonTableTestCase.scala ---
    @@ -273,7 +251,9 @@ class InsertIntoCarbonTableTestCase extends QueryTest with BeforeAndAfterAll {
         sql("LOAD DATA INPATH '" + resourcesPath + "/100_olap.csv' overwrite INTO table TCarbonSourceOverwrite options ('DELIMITER'=',', 'QUOTECHAR'='\', 'FILEHEADER'='imei,deviceInformationId,MAC,deviceColor,device_backColor,modelId,marketName,AMSize,ROMSize,CUPAudit,CPIClocked,series,productionDate,bomCode,internalModels,deliveryTime,channelsId,channelsName,deliveryAreaId,deliveryCountry,deliveryProvince,deliveryCity,deliveryDistrict,deliveryStreet,oxSingleNumber,ActiveCheckTime,ActiveAreaId,ActiveCountry,ActiveProvince,Activecity,ActiveDistrict,ActiveStreet,ActiveOperatorId,Active_releaseId,Active_EMUIVersion,Active_operaSysVersion,Active_BacVerNumber,Active_BacFlashVer,Active_webUIVersion,Active_webUITypeCarrVer,Active_webTypeDataVerNumber,Active_operatorsVersion,Active_phonePADPartitionedVersions,Latest_YEAR,Latest_MONTH,Latest_DAY,Latest_HOUR,Latest_areaId,Latest_country,Latest_province,Latest_city,Latest_district,Latest_street,Latest_releaseId,Latest_EMUIVersion,Latest_operaS
 ysVersion,Latest_BacVerNumber,Latest_BacFlashVer,Latest_webUIVersion,Latest_webUITypeCarrVer,Latest_webTypeDataVerNumber,Latest_operatorsVersion,Latest_phonePADPartitionedVersions,Latest_operatorId,gamePointDescription,gamePointId,contractNumber')")
         sql(s"LOAD DATA local INPATH '$resourcesPath/100_olap.csv' overwrite INTO TABLE HiveOverwrite")
         checkAnswer(sql("select count(*) from TCarbonSourceOverwrite"), sql("select count(*) from HiveOverwrite"))
    -    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, timeStampPropOrig)
    +    val folder = new File(s"$storeLocation/default/CarbonOverwrite/Fact/Part0/")
    --- End diff --
   
    Use tcarbonsourceoverwrite not CarbonOverwrite and also use lower case as folder, so use as below to avoid test failure,
    ```
    val folder = new File(s"$storeLocation/default/tcarbonsourceoverwrite/Fact/Part0/")
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3361/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
 
    Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/764/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
 
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/95/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1227
 
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1227


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
12