Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #3046: [WIP] Added check to start fallback based on ...

Classic

List

38 messages Options

Options

12

[GitHub] carbondata issue #3046: [WIP] Added check to start fallback based on size

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/3046

@xuchuanyin The problem was that when using varchar column with email data the key for the dictionary map is very huge. When fallback happens the same data is kept in memory twice, which causes the application to throw OOM exception.
As this is a minor version therefore we do not want to expose this property to user, we can take in the next major version.

Plus parquet also has a size based limitation mechanism which ensures the size does not grow more than what the system can handle.

This PR is just to add the size based limitation so that the map size can be controlled.

---

[GitHub] carbondata pull request #3046: [WIP] Added check to start fallback based on ...

In reply to this post by qiuchenjian-2

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3046#discussion_r245555497

--- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
@@ -2076,4 +2076,15 @@ private CarbonCommonConstants() {
*/
public static final String CARBON_QUERY_DATAMAP_BLOOM_CACHE_SIZE_DEFAULT_VAL = "512";

+ public static final String CARBON_LOCAL_DICTIONARY_MAX_THRESHOLD =
--- End diff --

changed to CARBON_LOCAL_DICTIONARY_MAX_SIZE_THRESHOLD

---

[GitHub] carbondata issue #3046: [WIP] Fix OOM exception when dictionary map size is ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3046

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2186/

---

[GitHub] carbondata issue #3046: [WIP] Fix OOM exception when dictionary map size is ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3046

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10442/

---

[GitHub] carbondata issue #3046: [WIP] Fix OOM exception when dictionary map size is ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3046

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2402/

---

[GitHub] carbondata issue #3046: [CARBONDATA-3231] Fix OOM exception when dictionary ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3046

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2188/

---

[GitHub] carbondata issue #3046: [CARBONDATA-3231] Fix OOM exception when dictionary ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3046

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10444/

---

[GitHub] carbondata issue #3046: [CARBONDATA-3231] Fix OOM exception when dictionary ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3046

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2404/

---

[GitHub] carbondata pull request #3046: [CARBONDATA-3231] Fix OOM exception when dict...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3046#discussion_r245630539

--- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ---
@@ -1491,6 +1491,27 @@ private void validateSortMemorySpillPercentage() {
}
}

+ public int getMaxDictionaryThreshold() {
+ int localDictionaryMaxThreshold = Integer.parseInt(carbonProperties
+ .getProperty(CarbonCommonConstants.CARBON_LOCAL_DICTIONARY_MAX_SIZE_THRESHOLD,
+ CarbonCommonConstants.CARBON_LOCAL_DICTIONARY_MAX_SIZE_THRESHOLD_DEFAULT));
+ if (localDictionaryMaxThreshold
--- End diff --

add min check also

---

[GitHub] carbondata pull request #3046: [CARBONDATA-3231] Fix OOM exception when dict...

In reply to this post by qiuchenjian-2

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3046#discussion_r245653091

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/DecoderBasedFallbackEncoder.java ---
@@ -57,10 +57,7 @@ public DecoderBasedFallbackEncoder(EncodedColumnPage encodedColumnPage, int page
int pageSize =
encodedColumnPage.getActualPage().getPageSize();
int offset = 0;
- int[] reverseInvertedIndex = new int[pageSize];
--- End diff --

In case of No_Sort where inverted index is not there, this variable was being created unnecessarily.

---

[GitHub] carbondata issue #3046: [CARBONDATA-3231] Fix OOM exception when dictionary ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3046

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2204/

---

[GitHub] carbondata pull request #3046: [CARBONDATA-3231] Fix OOM exception when dict...

In reply to this post by qiuchenjian-2

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3046#discussion_r245877242

--- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ---
@@ -1491,6 +1491,27 @@ private void validateSortMemorySpillPercentage() {
}
}

+ public int getMaxDictionaryThreshold() {
+ int localDictionaryMaxThreshold = Integer.parseInt(carbonProperties
+ .getProperty(CarbonCommonConstants.CARBON_LOCAL_DICTIONARY_MAX_SIZE_THRESHOLD,
+ CarbonCommonConstants.CARBON_LOCAL_DICTIONARY_MAX_SIZE_THRESHOLD_DEFAULT));
+ if (localDictionaryMaxThreshold
--- End diff --

ok

---

[GitHub] carbondata issue #3046: [CARBONDATA-3231] Fix OOM exception when dictionary ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3046

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2421/

---

[GitHub] carbondata issue #3046: [CARBONDATA-3231] Fix OOM exception when dictionary ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3046

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10461/

---

[GitHub] carbondata pull request #3046: [CARBONDATA-3231] Fix OOM exception when dict...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3046#discussion_r246056127

--- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
@@ -2076,4 +2076,15 @@ private CarbonCommonConstants() {
*/
public static final String CARBON_QUERY_DATAMAP_BLOOM_CACHE_SIZE_DEFAULT_VAL = "512";

+ public static final String CARBON_LOCAL_DICTIONARY_MAX_THRESHOLD =
--- End diff --

It still failed to make it clear that what kind of size it supposed to be since we have a storage size and a counting size.

---

[GitHub] carbondata issue #3046: [CARBONDATA-3231] Fix OOM exception when dictionary ...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/3046

> This PR is just to add the size based limitation so that the map size can be controlled.
@kunal642 Yeah, I noticed that. So my proposal is that please make a reservation for minimal changes when we want to implement that feature (automatically size detect and fall back) later.

---

[GitHub] carbondata issue #3046: [CARBONDATA-3231] Fix OOM exception when dictionary ...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/3046

@xuchuanyin In near future we are planing to change threshold(currently based on number) to size based local dictionary. Size based threshold will give more control.
Current changes in the PR is helping in doing that.
Later Just have to expose the table property in create table command for user to control the size threshold.

Also didn't get the meaning of your comment. these changes are minimal now also.

---

[GitHub] carbondata issue #3046: [CARBONDATA-3231] Fix OOM exception when dictionary ...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/3046

We do not need to expose this threshold to the user. Instead, we can judge ourselves in carbondata.

Step1. We can get the size of non-dictionary-encoded page (say M) and the size of dictionary-encoded page (say N).
Step2: if M/N >=1 (or M/N >= 0.9), we can fallback automatically.

Parquet (maybe ORC) behaves like this.

---

12