[GitHub] carbondata pull request #1812: [CARBONDATA-2033]support user specified segme...

classic Classic list List threaded Threaded
59 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]support user specified segme...

qiuchenjian-2
GitHub user Xaprice opened a pull request:

    https://github.com/apache/carbondata/pull/1812

    [CARBONDATA-2033]support user specified segments in major compation

    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
    **no**
     - [ ] Any backward compatibility impacted?
      **no**
     - [x] Document update required?
    **Yes, data-management-on-carbondata.md has been updated.**
     - [x] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
           **yes**
            - How it is tested? Please attach test report.
           **test on cluster with 7 nodes**
            - Is it a performance related change? Please attach the performance test report.
           **no**
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Xaprice/carbondata specified_segs_in_major_compact

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1812.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1812
   
----
commit 96bddafbc9edf48cbb427a75d267178cc1cef2f8
Author: Jin Zhou <xaprice@...>
Date:   2018-01-16T09:02:51Z

    [CARBONDATA-2033]support user specified segments in major compation

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

qiuchenjian-2
Github user Xaprice commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Hi @chenliang613 , can you please take a look?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2828/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1593/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2923/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    please change the title to : [CARBONDATA-2033] Support user specified segments in major compaction


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]support user specified segme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1812#discussion_r162245791
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java ---
    @@ -441,6 +452,30 @@ public int compare(LoadMetadataDetails seg1, LoadMetadataDetails seg2) {
         });
       }
     
    +  /**
    +   * This method will return the list of loads which are specified by user in SQL.
    +   *
    +   * @param listOfSegmentsLoadedInSameDateInterval
    +   * @param segmentIds
    +   * @return
    +   */
    +  private static List<LoadMetadataDetails> identitySegmentsToBeMergedBasedOnSpecifiedSegments(
    +          List<LoadMetadataDetails> listOfSegmentsLoadedInSameDateInterval,
    +          Set<String> segmentIds) {
    +    List<LoadMetadataDetails> listOfSegmentsSpecified =
    +            new ArrayList<>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE);
    +    if (segmentIds != null && segmentIds.size() != 0) {
    +      for (LoadMetadataDetails detail : listOfSegmentsLoadedInSameDateInterval) {
    +        if (isSegmentValid(detail) && segmentIds.contains(detail.getLoadName())) {
    --- End diff --
   
    If the specified segment is not valid, better throw exception about invalid segments instead of ignoring it


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Two questions:
    1. Why only consider major compaction with specified segments, no need to consider minor compaction?
    2. Whether can keep consistent syntax as "query with specified segments", or not ?
    a. First set segment id : "SET carbon.input.segments.dbname.tablename=1,3"
    b.Do compaction : ALTER TABLE tablename compact 'MAJOR'



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    @Xaprice I think we should have validation for the order of segments to be merged. For suppose we have segments of 1 to 8, and the user gives the compaction on 1, 5, 8 then this should not be valid as it will impact the order of data it is inserted.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1690/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2923/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user Xaprice commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    @chenliang613  
    For question 1:  I thought minor compaction are mainly used in auto-merging scenario. But after reconsidering this feature, maybe it's better to support both major and minor compaction. I will add support of minor compaction soon.
    For question 2: I will follow your advice and modify the syntax to keep consistent syntax as "query with specified segments".


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user Xaprice commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    @ravipesala  Compacting adjacent segments is certainly the best practice in most cases. But is it not flexible enough to take it as  a mandatory rule?  


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4022/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2778/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user Xaprice commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4026/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2781/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4030/



---
123