[GitHub] carbondata pull request #2706: [WIP] multiple issue fixes for varchar column...

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    @xuchuanyin This 2MB limit causing many issues in varchar and complex columns. We cannot let user to configure this internal limits. We should have a growable stream. Besides, we better remove this bytebuffer and set directly to unsafe.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kevinjmh commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2706#discussion_r218403711
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java ---
    @@ -239,6 +239,7 @@ public void addDataToStore(CarbonRow row) throws CarbonDataWriterException {
        * @return false if any varchar column page cannot add one more value(2MB)
        */
       private boolean isVarcharColumnFull(CarbonRow row) {
    +    //TODO: test and remove this as now  UnsafeSortDataRows can exceed 2MB
    --- End diff --
   
    Original implementation use `2MB` to ensure next varchar column value can be filled safely, because size of value of single column won't exceed size of a row.
    If UnsafeSortDataRows can exceed 2MB(growing dynamically), then we cannot check whether we have enough space for next value because we are not sure how many space next value will take. So the column page size check should be run before adding row to `dataRows`  


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2706#discussion_r218415262
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java ---
    @@ -239,6 +239,7 @@ public void addDataToStore(CarbonRow row) throws CarbonDataWriterException {
        * @return false if any varchar column page cannot add one more value(2MB)
        */
       private boolean isVarcharColumnFull(CarbonRow row) {
    +    //TODO: test and remove this as now  UnsafeSortDataRows can exceed 2MB
    --- End diff --
   
    I am not sure how we come to the conclusion of 2MB. There is no guarantee that we always sort the data to use UnsafeSortDataRows always. How about nosort case? And how about if user wants to add 100MB varchar how to support it.
    And also this is not just limited to varchar, we should consider for complex and string columns as well here.
    @ajantha-bhat Please remove that todo, But we need to refactor the code to ensure to keep the page size within the snappy max compressed length for complex and string datatypes as well.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/334/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8581/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/511/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/337/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2706#discussion_r218474623
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java ---
    @@ -239,6 +239,7 @@ public void addDataToStore(CarbonRow row) throws CarbonDataWriterException {
        * @return false if any varchar column page cannot add one more value(2MB)
        */
       private boolean isVarcharColumnFull(CarbonRow row) {
    +    //TODO: test and remove this as now  UnsafeSortDataRows can exceed 2MB
    --- End diff --
   
    @xuchuanyin @kevinjmh  @ravipesala @kumarvishal09 : As per discussion let us handle this with configurable page size [from 1 MB to 2GB(snappy max)] and split the complex child pages here only and add validation for each column based on row,
   
    This will be analyzed more and I will open a discussion in community and separate PR will be raised for this.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/339/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/516/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8586/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    @ravipesala : PR is ready please review


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    LGTM I am merging this PR. @ajantha-bhat Please start another discussion in the forum to support big column data up to 2GB for complex, varchar and string columns. And also make the page size configurable in terms of MB to avoid outofmemory while reading.



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2706


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    please take care about the loading performance compared with previous nio.buffer implementation.


---
123