Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] incubator-carbondata pull request #659: Reuse the same SegmentProperties obj...

Classic

List

37 messages Options

Options

12

[GitHub] incubator-carbondata issue #659: [CARBONDATA-781] Store one SegmentPropertie...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/659

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1359/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #659: [CARBONDATA-781] Store one SegmentPropertie...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on the issue:

https://github.com/apache/incubator-carbondata/pull/659

@kumarvishal09 dump picture is driver tree.
@watermen this pr only implement to reuse segment properties in driver side. can you try to do it in executor side? About the building of executor side tree, please have a look AbstractQueryExecutor.initQuery and BlockIndexStore.getAll.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #659: [CARBONDATA-781] Store one SegmentPropertie...

In reply to this post by qiuchenjian-2

Github user watermen commented on the issue:

https://github.com/apache/incubator-carbondata/pull/659

@QiangCai When we query the big table, the memory pressure in driver side is greater than executor side. So I think we can first reuse the segment properties in driver side in this pr, and do the reuse in executor side in another pr.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #659: [CARBONDATA-781] Store one SegmentPropertie...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on the issue:

https://github.com/apache/incubator-carbondata/pull/659

@watermen @QiangCai
1. In driver side segment is loaded based on taskid for each segment ...here we across task id for same segment we can create load only one segment properties ...and across segments if carbon table schema is same and cardinality of dictionary column is same we can reuse the same segment properties
2. In executor side across task for same executor if segment properties is same we can try to reuse the same segment properties

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #659: [CARBONDATA-781] Store one SegmentPropertie...

In reply to this post by qiuchenjian-2

Github user watermen commented on the issue:

https://github.com/apache/incubator-carbondata/pull/659

@kumarvishal09 I agree with your idea. And I think we should remain a static HashMap, which key is <AbsoluteTableIdentifier, List<ColumnSchema>, int[] columnCardinality>(This is my first implement). But @QiangCai think maybe we will alter the SegmentProperties object later(such as column visibility), so if we reuse the SegmentProperties object across the segments, something will be wrong.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #659: [CARBONDATA-781] Store one SegmentPropertie...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on the issue:

https://github.com/apache/incubator-carbondata/pull/659

@watermen you can move all the schema related properties to some wrapper class and in wrapper class implement hash code and equals based on dimension column(including complex dimension) and measure column and and in segment index store you can have a static hashmap for storing the table to list of segment properties , this will solve alter table problem

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #659: [CARBONDATA-781] Store one SegmentPr...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/659#discussion_r110810313

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/SegmentTaskIndexStore.java ---
@@ -338,14 +339,28 @@ private synchronized Object addAndGetSegmentLock(String segmentId) {
* @throws IOException
*/
private AbstractIndex loadBlocks(TaskBucketHolder taskBucketHolder,
- List<TableBlockInfo> tableBlockInfoList, AbsoluteTableIdentifier tableIdentifier)
+ List<TableBlockInfo> tableBlockInfoList, AbsoluteTableIdentifier tableIdentifier,
+ TableSegmentUniqueIdentifier tableSegmentUniqueIdentifier)
throws IOException {
// all the block of one task id will be loaded together
// so creating a list which will have all the data file meta data to of one task
List<DataFileFooter> footerList = CarbonUtil
.readCarbonIndexFile(taskBucketHolder.taskNo, taskBucketHolder.bucketNumber,
tableBlockInfoList, tableIdentifier);
- AbstractIndex segment = new SegmentTaskIndex();
+
+ if (null == tableSegmentUniqueIdentifier.getSegmentProperties()) {
+ // create a metadata details
+ // this will be useful in query handling
+ // all the data file metadata will have common segment properties we
+ // can use first one to get create the segment properties
+ SegmentProperties segmentProperties =
+ new SegmentProperties(footerList.get(0).getColumnInTable(),
--- End diff --

I think the hashmap approach is OK. You can keep a hashmap in this class and store the mapping from a key (calculated from schema) to the SegmentProperties object.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #659: [CARBONDATA-781] Store one SegmentPropertie...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/659

Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1726/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #659: [CARBONDATA-781] Reuse SegmentProperties ob...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/659

Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1729/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #659: [CARBONDATA-781] Reuse SegmentProperties ob...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/659

Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1734/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #659: [CARBONDATA-781] Reuse SegmentProperties ob...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/659

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1738/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #659: [CARBONDATA-781] Reuse SegmentProper...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/659#discussion_r112787886

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/SegmentTaskIndexStore.java ---
@@ -396,4 +419,34 @@ public TaskBucketHolder(String taskNo, String bucketNumber) {
return result;
}
}
+
+ public static class SegmentPropertiesWrapper {
--- End diff --

please add comment for this class

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #659: [CARBONDATA-781] Reuse SegmentProper...

In reply to this post by qiuchenjian-2

Github user watermen commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/659#discussion_r112799728

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/SegmentTaskIndexStore.java ---
@@ -396,4 +419,34 @@ public TaskBucketHolder(String taskNo, String bucketNumber) {
return result;
}
}
+
+ public static class SegmentPropertiesWrapper {
--- End diff --

@jackylk Added.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #659: [CARBONDATA-781] Reuse SegmentProperties ob...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/659

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1759/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #659: [CARBONDATA-781] Reuse SegmentProperties ob...

In reply to this post by qiuchenjian-2

Github user watermen commented on the issue:

https://github.com/apache/incubator-carbondata/pull/659

@jackylk @kumarvishal09 Can you review the code again?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #659: [CARBONDATA-781] Reuse SegmentProperties ob...

In reply to this post by qiuchenjian-2

Github user jackylk commented on the issue:

https://github.com/apache/incubator-carbondata/pull/659

LGTM

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #659: [CARBONDATA-781] Reuse SegmentProper...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/659

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

12