Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2706 @xuchuanyin This 2MB limit causing many issues in varchar and complex columns. We cannot let user to configure this internal limits. We should have a growable stream. Besides, we better remove this bytebuffer and set directly to unsafe. --- |
In reply to this post by qiuchenjian-2
Github user kevinjmh commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2706#discussion_r218403711 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java --- @@ -239,6 +239,7 @@ public void addDataToStore(CarbonRow row) throws CarbonDataWriterException { * @return false if any varchar column page cannot add one more value(2MB) */ private boolean isVarcharColumnFull(CarbonRow row) { + //TODO: test and remove this as now UnsafeSortDataRows can exceed 2MB --- End diff -- Original implementation use `2MB` to ensure next varchar column value can be filled safely, because size of value of single column won't exceed size of a row. If UnsafeSortDataRows can exceed 2MB(growing dynamically), then we cannot check whether we have enough space for next value because we are not sure how many space next value will take. So the column page size check should be run before adding row to `dataRows` --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2706#discussion_r218415262 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java --- @@ -239,6 +239,7 @@ public void addDataToStore(CarbonRow row) throws CarbonDataWriterException { * @return false if any varchar column page cannot add one more value(2MB) */ private boolean isVarcharColumnFull(CarbonRow row) { + //TODO: test and remove this as now UnsafeSortDataRows can exceed 2MB --- End diff -- I am not sure how we come to the conclusion of 2MB. There is no guarantee that we always sort the data to use UnsafeSortDataRows always. How about nosort case? And how about if user wants to add 100MB varchar how to support it. And also this is not just limited to varchar, we should consider for complex and string columns as well here. @ajantha-bhat Please remove that todo, But we need to refactor the code to ensure to keep the page size within the snappy max compressed length for complex and string datatypes as well. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2706 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/334/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2706 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8581/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2706 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/511/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2706 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/337/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2706#discussion_r218474623 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java --- @@ -239,6 +239,7 @@ public void addDataToStore(CarbonRow row) throws CarbonDataWriterException { * @return false if any varchar column page cannot add one more value(2MB) */ private boolean isVarcharColumnFull(CarbonRow row) { + //TODO: test and remove this as now UnsafeSortDataRows can exceed 2MB --- End diff -- @xuchuanyin @kevinjmh @ravipesala @kumarvishal09 : As per discussion let us handle this with configurable page size [from 1 MB to 2GB(snappy max)] and split the complex child pages here only and add validation for each column based on row, This will be analyzed more and I will open a discussion in community and separate PR will be raised for this. --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2706 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2706 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/339/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2706 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/516/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2706 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8586/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2706 @ravipesala : PR is ready please review --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2706 LGTM I am merging this PR. @ajantha-bhat Please start another discussion in the forum to support big column data up to 2GB for complex, varchar and string columns. And also make the page size configurable in terms of MB to avoid outofmemory while reading. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2706 LGTM --- |
In reply to this post by qiuchenjian-2
|
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2706 please take care about the loading performance compared with previous nio.buffer implementation. --- |
Free forum by Nabble | Edit this page |