Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]support user specified segme...

Classic

List

59 messages Options

Options

123

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]support user specified segme...

GitHub user Xaprice opened a pull request:

https://github.com/apache/carbondata/pull/1812

[CARBONDATA-2033]support user specified segments in major compation

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed?
**no**
- [ ] Any backward compatibility impacted?
**no**
- [x] Document update required?
**Yes, data-management-on-carbondata.md has been updated.**
- [x] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
**yes**
- How it is tested? Please attach test report.
**test on cluster with 7 nodes**
- Is it a performance related change? Please attach the performance test report.
**no**
- Any additional information to help reviewers in testing this change.

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Xaprice/carbondata specified_segs_in_major_compact

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1812.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1812

----
commit 96bddafbc9edf48cbb427a75d267178cc1cef2f8
Author: Jin Zhou <xaprice@...>
Date: 2018-01-16T09:02:51Z

[CARBONDATA-2033]support user specified segments in major compation

----

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1812

Hi @chenliang613 , can you please take a look?

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1812

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2828/

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1812

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1593/

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1812

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2923/

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1812

retest this please

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1812

please change the title to : [CARBONDATA-2033] Support user specified segments in major compaction

---

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]support user specified segme...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1812#discussion_r162245791

--- Diff: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java ---
@@ -441,6 +452,30 @@ public int compare(LoadMetadataDetails seg1, LoadMetadataDetails seg2) {
});
}

+ /**
+ * This method will return the list of loads which are specified by user in SQL.
+ *
+ * @param listOfSegmentsLoadedInSameDateInterval
+ * @param segmentIds
+ * @return
+ */
+ private static List<LoadMetadataDetails> identitySegmentsToBeMergedBasedOnSpecifiedSegments(
+ List<LoadMetadataDetails> listOfSegmentsLoadedInSameDateInterval,
+ Set<String> segmentIds) {
+ List<LoadMetadataDetails> listOfSegmentsSpecified =
+ new ArrayList<>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE);
+ if (segmentIds != null && segmentIds.size() != 0) {
+ for (LoadMetadataDetails detail : listOfSegmentsLoadedInSameDateInterval) {
+ if (isSegmentValid(detail) && segmentIds.contains(detail.getLoadName())) {
--- End diff --

If the specified segment is not valid, better throw exception about invalid segments instead of ignoring it

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1812

Two questions:
1. Why only consider major compaction with specified segments, no need to consider minor compaction?
2. Whether can keep consistent syntax as "query with specified segments", or not ?
a. First set segment id : "SET carbon.input.segments.dbname.tablename=1,3"
b.Do compaction : ALTER TABLE tablename compact 'MAJOR'

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1812

@Xaprice I think we should have validation for the order of segments to be merged. For suppose we have segments of 1 to 8, and the user gives the compaction on 1, 5, 8 then this should not be valid as it will impact the order of data it is inserted.

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1812

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1690/

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1812

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2923/

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1812

@chenliang613
For question 1: I thought minor compaction are mainly used in auto-merging scenario. But after reconsidering this feature, maybe it's better to support both major and minor compaction. I will add support of minor compaction soon.
For question 2: I will follow your advice and modify the syntax to keep consistent syntax as "query with specified segments".

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1812

@ravipesala Compacting adjacent segments is certainly the best practice in most cases. But is it not flexible enough to take it as a mandatory rule?

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1812

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4022/

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1812

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2778/

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1812

retest this please

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1812

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4026/

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1812

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2781/

---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1812

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4030/

---

123