[GitHub] carbondata pull request #1641: select with group by and insertoverwrite to a...

classic Classic list List threaded Threaded
52 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1641: select with group by and insertoverwrite to a...

qiuchenjian-2
GitHub user kushalsaha opened a pull request:

    https://github.com/apache/carbondata/pull/1641

    select with group by and insertoverwrite to another carbon table

    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
    No
     
     - [ ] Any backward compatibility impacted?
     No
     - [ ] Document update required?
    No
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           Yes
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kushalsaha/carbondata DTS_overwrite

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1641.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1641
   
----
commit 06591387467b844340a832bb9ed1f5b76ac6b1e7
Author: kushalsaha <[hidden email]>
Date:   2017-12-11T15:41:21Z

    select with group by and insertoverwrite to another carbon table

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1641: select with group by and insertoverwrite to another ...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1641
 
    Can one of the admins verify this patch?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1641
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1877/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1641
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2223/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1641: [CARBONDATA-1882] select with group by and in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1641#discussion_r156275284
 
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -486,6 +486,21 @@ object CarbonDataRDDFactory {
           // if segment is empty then fail the data load
    --- End diff --
   
    Correct comment


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1641: [CARBONDATA-1882] select with group by and in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1641#discussion_r156276866
 
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -486,6 +486,21 @@ object CarbonDataRDDFactory {
           // if segment is empty then fail the data load
           if (!carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.isChildDataMap &&
               !CarbonLoaderUtil.isValidSegment(carbonLoadModel, carbonLoadModel.getSegmentId.toInt)) {
    +
    +        if (overwriteTable && dataFrame.isDefined) {
    +          carbonLoadModel.getLoadMetadataDetails.asScala.foreach {
    +            loadDetails =>
    +              if (loadDetails.getSegmentStatus.equals(SegmentStatus.SUCCESS)) {
    +                loadDetails.setSegmentStatus(SegmentStatus.MARKED_FOR_DELETE)
    +              }
    +          }
    +          val carbonTablePath = CarbonStorePath
    --- End diff --
   
    1) loadTablePreStatusUpdateEvent is not fired,
    2) how about old dictionary to be overwritten?
    3) updatestatus file also needs to be handled accordingly.
    Suggest to flow the original flow handling empty segment case


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1641: [CARBONDATA-1882] select with group by and in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kushalsaha commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1641#discussion_r156318860
 
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -486,6 +486,21 @@ object CarbonDataRDDFactory {
           // if segment is empty then fail the data load
           if (!carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.isChildDataMap &&
               !CarbonLoaderUtil.isValidSegment(carbonLoadModel, carbonLoadModel.getSegmentId.toInt)) {
    +
    +        if (overwriteTable && dataFrame.isDefined) {
    +          carbonLoadModel.getLoadMetadataDetails.asScala.foreach {
    +            loadDetails =>
    +              if (loadDetails.getSegmentStatus.equals(SegmentStatus.SUCCESS)) {
    +                loadDetails.setSegmentStatus(SegmentStatus.MARKED_FOR_DELETE)
    +              }
    +          }
    +          val carbonTablePath = CarbonStorePath
    --- End diff --
   
    loadTablePreStatusUpdateEvent is not fired, :- it will fire only when the data loading is done . in case of zero record data loading is not done
   
    how about old dictionary to be overwritten :- in case of insert overwrite dictionary is appending


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1641
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1926/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1641
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2256/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1641
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2257/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1641
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2258/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1641: [CARBONDATA-1882] select with group by and in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1641#discussion_r156698805
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/InsertIntoCarbonTableTestCase.scala ---
    @@ -276,8 +281,178 @@ class InsertIntoCarbonTableTestCase extends QueryTest with BeforeAndAfterAll {
         }
         sql("LOAD DATA INPATH '" + resourcesPath + "/100_olap.csv' overwrite INTO table TCarbonSourceOverwrite options ('DELIMITER'=',', 'QUOTECHAR'='\', 'FILEHEADER'='imei,deviceInformationId,MAC,deviceColor,device_backColor,modelId,marketName,AMSize,ROMSize,CUPAudit,CPIClocked,series,productionDate,bomCode,internalModels,deliveryTime,channelsId,channelsName,deliveryAreaId,deliveryCountry,deliveryProvince,deliveryCity,deliveryDistrict,deliveryStreet,oxSingleNumber,ActiveCheckTime,ActiveAreaId,ActiveCountry,ActiveProvince,Activecity,ActiveDistrict,ActiveStreet,ActiveOperatorId,Active_releaseId,Active_EMUIVersion,Active_operaSysVersion,Active_BacVerNumber,Active_BacFlashVer,Active_webUIVersion,Active_webUITypeCarrVer,Active_webTypeDataVerNumber,Active_operatorsVersion,Active_phonePADPartitionedVersions,Latest_YEAR,Latest_MONTH,Latest_DAY,Latest_HOUR,Latest_areaId,Latest_country,Latest_province,Latest_city,Latest_district,Latest_street,Latest_releaseId,Latest_EMUIVersion,Latest_operaS
 ysVersion,Latest_BacVerNumber,Latest_BacFlashVer,Latest_webUIVersion,Latest_webUITypeCarrVer,Latest_webTypeDataVerNumber,Latest_operatorsVersion,Latest_phonePADPartitionedVersions,Latest_operatorId,gamePointDescription,gamePointId,contractNumber')")
         assert(rowCount == sql("select imei from TCarbonSourceOverwrite").count())
    +
    +  }
    +
    +  test("insert overwrite in group by scenario with t1 no record and t2 some record") {
    --- End diff --
   
    Move common code to a function


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1641: [CARBONDATA-1882] select with group by and in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1641#discussion_r156700497
 
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -375,7 +375,15 @@ object CarbonDataRDDFactory {
                   }
               }
             } else {
    -          loadStatus = SegmentStatus.LOAD_FAILURE
    +          if (dataFrame.isDefined && updateModel.isEmpty) {
    --- End diff --
   
    Write comment explaining this


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1641: [CARBONDATA-1882] select with group by and in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1641#discussion_r156700781
 
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -483,20 +491,21 @@ object CarbonDataRDDFactory {
                          s"${ carbonLoadModel.getDatabaseName }.${ carbonLoadModel.getTableName }")
             throw new Exception(status(0)._2._2.errorMsg)
           }
    -      // if segment is empty then fail the data load
    +
    +      var newEntryLoadStatus =
           if (!carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.isChildDataMap &&
               !CarbonLoaderUtil.isValidSegment(carbonLoadModel, carbonLoadModel.getSegmentId.toInt)) {
    -        // update the load entry in table status file for changing the status to marked for delete
    -        CommonUtil.updateTableStatusForFailure(carbonLoadModel)
    -        LOGGER.info("********starting clean up**********")
    -        CarbonLoaderUtil.deleteSegment(carbonLoadModel, carbonLoadModel.getSegmentId.toInt)
    -        LOGGER.info("********clean up done**********")
    +
             LOGGER.audit(s"Data load is failed for " +
                          s"${ carbonLoadModel.getDatabaseName }.${ carbonLoadModel.getTableName }" +
                          " as there is no data to load")
             LOGGER.warn("Cannot write load metadata file as data load failed")
    -        throw new Exception("No Data to load")
    +
    --- End diff --
   
    write comment 'as no records loaded in new segment, new segment should be deleted'


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1641
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1946/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1641
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2272/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1641
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1959/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1641
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2281/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1641
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1969/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1641: [CARBONDATA-1882] select with group by and in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kushalsaha commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1641#discussion_r156994717
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/InsertIntoCarbonTableTestCase.scala ---
    @@ -276,6 +281,121 @@ class InsertIntoCarbonTableTestCase extends QueryTest with BeforeAndAfterAll {
         }
         sql("LOAD DATA INPATH '" + resourcesPath + "/100_olap.csv' overwrite INTO table TCarbonSourceOverwrite options ('DELIMITER'=',', 'QUOTECHAR'='\', 'FILEHEADER'='imei,deviceInformationId,MAC,deviceColor,device_backColor,modelId,marketName,AMSize,ROMSize,CUPAudit,CPIClocked,series,productionDate,bomCode,internalModels,deliveryTime,channelsId,channelsName,deliveryAreaId,deliveryCountry,deliveryProvince,deliveryCity,deliveryDistrict,deliveryStreet,oxSingleNumber,ActiveCheckTime,ActiveAreaId,ActiveCountry,ActiveProvince,Activecity,ActiveDistrict,ActiveStreet,ActiveOperatorId,Active_releaseId,Active_EMUIVersion,Active_operaSysVersion,Active_BacVerNumber,Active_BacFlashVer,Active_webUIVersion,Active_webUITypeCarrVer,Active_webTypeDataVerNumber,Active_operatorsVersion,Active_phonePADPartitionedVersions,Latest_YEAR,Latest_MONTH,Latest_DAY,Latest_HOUR,Latest_areaId,Latest_country,Latest_province,Latest_city,Latest_district,Latest_street,Latest_releaseId,Latest_EMUIVersion,Latest_operaS
 ysVersion,Latest_BacVerNumber,Latest_BacFlashVer,Latest_webUIVersion,Latest_webUITypeCarrVer,Latest_webTypeDataVerNumber,Latest_operatorsVersion,Latest_phonePADPartitionedVersions,Latest_operatorId,gamePointDescription,gamePointId,contractNumber')")
         assert(rowCount == sql("select imei from TCarbonSourceOverwrite").count())
    +
    +  }
    +
    +  test("insert overwrite in group by scenario with t1 no record and t2 no record") {
    +    queryExecution("overwriteTable1_noRecord.csv","overwriteTable2_noRecord.csv")
    +    sql ("insert overwrite table OverwriteTable_t2 select id,name,sum(salary) as TotalSalary,'98' as age from OverwriteTable_t1 group by id,name,salary")
    +    val exists_t1 = checkSegment("OverwriteTable_t1")
    +    val exists_t2 = checkSegment("OverwriteTable_t2")
    +    assert(!exists_t1)
    +    assert(!exists_t2)
    +    assert(sql("select * from OverwriteTable_t1").count() == sql("select * from OverwriteTable_t2").count())
    +    checkAnswer(sql("select * from OverwriteTable_t2"),
    +      Seq())
    +    checkAnswer(sql("select * from OverwriteTable_t1"),
    +      sql("select * from OverwriteTable_t2"))
    +  }
    +
    +
    +  test("insert overwrite in group by scenario with t1 no record and t2 some record") {
    +    queryExecution("overwriteTable1_noRecord.csv","overwriteTable2_someRecord.csv")
    +    sql ("insert overwrite table OverwriteTable_t2 select id,name,sum(salary) as TotalSalary,'98' as age from OverwriteTable_t1 group by id,name,salary")
    --- End diff --
   
    only insert overwrite query is kept in the testcases as we handled two different scenario 1) IN GROUP BY CASE
    2) WITHOUT GROUP BY CASE   if we refactor so one extra method need to be written where again same code will exists


---
123