[GitHub] carbondata pull request #2383: [CARBONDATA-2615][32K] Support page size less...

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2383: [CARBONDATA-2615][32K] Support page size less...

qiuchenjian-2
GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/2383

    [CARBONDATA-2615][32K]  Support page size less than 32000 in CarbondataV3

    Since we support super long string, if it is long enough, a column page
    with 32000 rows will exceed 2GB, so we support a page less than 32000
    rows.
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [x] Any interfaces changed?
     `NO`
     - [x] Any backward compatibility impacted?
      `NO`
     - [x] Document update required?
     `NO`
     - [x] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
    `Tests added`
            - How it is tested? Please attach test report.
    `Tested in local`
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata 0620_long_string_decrease_pagesize

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2383.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2383
   
----
commit b689d66493521452ff9938415e0d0aa66b56c2c5
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-06-02T07:17:04Z

    Support string longer than 32000 characters
   
    Add a table property 'long_string_columns' in create table DDL that
    indicate those columns will contain more than 32000 characters.
   
    Internally in Carbondata,
    1. add a new datatype called `text` to represent the long string column
    2. add a new encoding called `DIRECT_COMPRESS_TEXT` to the text column
    page meta
    3. Use an integer (previously short) to store the length of bytes
    content.

commit f145c6c60238c400b5db6a6bf2696246b698154a
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-06-05T12:46:26Z

    rename datatype name from text to varchar

commit 4180f8118d1ff90205b0f1567bef2cdfee3a1b62
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-06-12T12:35:58Z

    Add 2GB constraint for one column page

commit 710845b155ed5b7a611a900c70b0d766d80ae48d
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-06-14T12:11:40Z

    update tests

commit 74106d2793ed97615a439576b1c16d34bfaa3ab7
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-06-19T07:49:57Z

    support write long string from dataframe

commit 7d4325aa31dccbe4f7858f39de3378eafff30016
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-06-19T09:21:04Z

    Support page size less than 32000 in CarbondataV3
   
    Since we support super long string, if it is long enough, a column page
    with 32000 rows will exceed 2GB, so we support a page less than 32000
    rows.

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    This PR depends on #2382


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6373/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5211/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    retest it please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5321/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6380/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5215/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    @xuchuanyin I think better to restrict based on number of bytes 67104 for each column value, as user may not know how many character will be present , so its hard for the user to configure blocklet size.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    @kumarvishal09 I asked someone who has the longstring requirement and get the response that the length of string is about 100K.
    Since we don't want to change the internal implementation of column page, decreasing the row number in a page may be the only way to solve the problem.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2383: [CARBONDATA-2615][32K] Support page size less...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2383#discussion_r196487039
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java ---
    @@ -371,8 +371,13 @@ private void setWritingConfiguration() throws CarbonDataWriterException {
         this.pageSize = Integer.parseInt(CarbonProperties.getInstance()
             .getProperty(CarbonCommonConstants.BLOCKLET_SIZE,
                 CarbonCommonConstants.BLOCKLET_SIZE_DEFAULT_VAL));
    +    // support less than 32000 rows in one page, because we support super long string,
    +    // if it is long enough, a clomun page with 32000 rows will exceed 2GB
         if (version == ColumnarFormatVersion.V3) {
    -      this.pageSize = CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT;
    +      this.pageSize =
    --- End diff --
   
    how much is the default value for page size ?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    @xuchuanyin then number of rows will depend on number of character in long string columns right?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2383: [CARBONDATA-2615][32K] Support page size less...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2383#discussion_r196631555
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java ---
    @@ -371,8 +371,13 @@ private void setWritingConfiguration() throws CarbonDataWriterException {
         this.pageSize = Integer.parseInt(CarbonProperties.getInstance()
             .getProperty(CarbonCommonConstants.BLOCKLET_SIZE,
                 CarbonCommonConstants.BLOCKLET_SIZE_DEFAULT_VAL));
    +    // support less than 32000 rows in one page, because we support super long string,
    +    // if it is long enough, a clomun page with 32000 rows will exceed 2GB
         if (version == ColumnarFormatVersion.V3) {
    -      this.pageSize = CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT;
    +      this.pageSize =
    --- End diff --
   
    In V3, it is 32000 by default. Here we use the min(32000, user_specified)


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    @kumarvishal09 If the string is too long, the user have to adjust the page size manually. We cannot do it dynamic for now.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6399/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5233/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5344/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    created a jira CARBONDATA-2613 to do this automatically


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2383
 
    @xuchuanyin Please rebase


---
12