Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2706: [WIP] multiple issue fixes for varchar column...

Classic

List

57 messages Options

Options

123

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2706

@xuchuanyin This 2MB limit causing many issues in varchar and complex columns. We cannot let user to configure this internal limits. We should have a growable stream. Besides, we better remove this bytebuffer and set directly to unsafe.

---

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

In reply to this post by qiuchenjian-2

Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r218403711

--- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java ---
@@ -239,6 +239,7 @@ public void addDataToStore(CarbonRow row) throws CarbonDataWriterException {
* @return false if any varchar column page cannot add one more value(2MB)
*/
private boolean isVarcharColumnFull(CarbonRow row) {
+ //TODO: test and remove this as now UnsafeSortDataRows can exceed 2MB
--- End diff --

Original implementation use `2MB` to ensure next varchar column value can be filled safely, because size of value of single column won't exceed size of a row.
If UnsafeSortDataRows can exceed 2MB(growing dynamically), then we cannot check whether we have enough space for next value because we are not sure how many space next value will take. So the column page size check should be run before adding row to `dataRows`

---

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r218415262

--- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java ---
@@ -239,6 +239,7 @@ public void addDataToStore(CarbonRow row) throws CarbonDataWriterException {
* @return false if any varchar column page cannot add one more value(2MB)
*/
private boolean isVarcharColumnFull(CarbonRow row) {
+ //TODO: test and remove this as now UnsafeSortDataRows can exceed 2MB
--- End diff --

I am not sure how we come to the conclusion of 2MB. There is no guarantee that we always sort the data to use UnsafeSortDataRows always. How about nosort case? And how about if user wants to add 100MB varchar how to support it.
And also this is not just limited to varchar, we should consider for complex and string columns as well here.
@ajantha-bhat Please remove that todo, But we need to refactor the code to ensure to keep the page size within the snappy max compressed length for complex and string datatypes as well.

---

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2706

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/334/

---

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2706

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8581/

---

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2706

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/511/

---

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2706

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/337/

---

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r218474623

--- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java ---
@@ -239,6 +239,7 @@ public void addDataToStore(CarbonRow row) throws CarbonDataWriterException {
* @return false if any varchar column page cannot add one more value(2MB)
*/
private boolean isVarcharColumnFull(CarbonRow row) {
+ //TODO: test and remove this as now UnsafeSortDataRows can exceed 2MB
--- End diff --

@xuchuanyin @kevinjmh @ravipesala @kumarvishal09 : As per discussion let us handle this with configurable page size [from 1 MB to 2GB(snappy max)] and split the complex child pages here only and add validation for each column based on row,

This will be analyzed more and I will open a discussion in community and separate PR will be raised for this.

---

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2706

retest this please

---

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2706

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/339/

---

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2706

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/516/

---

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2706

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8586/

---

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2706

@ravipesala : PR is ready please review

---

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2706

LGTM I am merging this PR. @ajantha-bhat Please start another discussion in the forum to support big column data up to 2GB for complex, varchar and string columns. And also make the page size configurable in terms of MB to avoid outofmemory while reading.

---

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2706

LGTM

---

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2706

---

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2706

please take care about the loading performance compared with previous nio.buffer implementation.

---

123