[GitHub] carbondata pull request #2252: WIP: Support string longer than 32000 charact...

classic Classic list List threaded Threaded
80 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420] Support string longer than 32000 c...

qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    @ravipesala @kumarvishal09 @jackylk
    Based on @ravipesala 's advice, I changed the name of internal datatype from `text` to `varchar`
   
    Later I'll raise another PR to make it possible for user to directly use `varchar(size)` in DDL statement.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420] Support string longer than 32000 c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6259/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420] Support string longer than 32000 c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5097/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420] Support string longer than 32000 c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5229/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2252: [CARBONDATA-2420] Support string longer than ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2252#discussion_r194350793
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/VariableLengthDimensionColumnPage.java ---
    @@ -30,21 +30,19 @@
     
       /**
        * Constructor for this class
    -   * @param dataChunks
    -   * @param invertedIndex
    -   * @param invertedIndexReverse
    -   * @param numberOfRows
        */
       public VariableLengthDimensionColumnPage(byte[] dataChunks, int[] invertedIndex,
    -      int[] invertedIndexReverse, int numberOfRows) {
    +      int[] invertedIndexReverse, int numberOfRows, boolean isVarcharEncoded) {
    --- End diff --
   
    Its better to pass store type it self from the caller instead for boolean


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2252: [CARBONDATA-2420] Support string longer than ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2252#discussion_r194479546
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java ---
    @@ -56,7 +57,8 @@ public SafeVariableLengthDimensionDataChunkStore(boolean isInvertedIndex, int nu
        * @param invertedIndexReverse inverted index reverse to be stored
        * @param data                 data to be stored
        */
    -  @Override public void putArray(final int[] invertedIndex, final int[] invertedIndexReverse,
    +  @Override
    +  public void putArray(final int[] invertedIndex, final int[] invertedIndexReverse,
    --- End diff --
   
    @xuchuanyin In Case of varchar column I think we will not be able to store complete data in single byte array as it may overflow, in case of short it is fine as maximum it will be of  32000(max number of bytes in each value) * 32000(number of records in page) + 64000(length for each value in a page), so it will be 1024064000 value and as maximum u can create an array of integer max value so it is fine, but in varchar case length of each value can be maximum of Integer max it self so we will not be able to store in single byte array. This handling is required in both data loading and query.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2252: [CARBONDATA-2420] Support string longer than ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2252#discussion_r194597446
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/VariableLengthDimensionColumnPage.java ---
    @@ -30,21 +30,19 @@
     
       /**
        * Constructor for this class
    -   * @param dataChunks
    -   * @param invertedIndex
    -   * @param invertedIndexReverse
    -   * @param numberOfRows
        */
       public VariableLengthDimensionColumnPage(byte[] dataChunks, int[] invertedIndex,
    -      int[] invertedIndexReverse, int numberOfRows) {
    +      int[] invertedIndexReverse, int numberOfRows, boolean isVarcharEncoded) {
    --- End diff --
   
    OK~


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2252: [CARBONDATA-2420] Support string longer than ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2252#discussion_r194603907
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java ---
    @@ -56,7 +57,8 @@ public SafeVariableLengthDimensionDataChunkStore(boolean isInvertedIndex, int nu
        * @param invertedIndexReverse inverted index reverse to be stored
        * @param data                 data to be stored
        */
    -  @Override public void putArray(final int[] invertedIndex, final int[] invertedIndexReverse,
    +  @Override
    +  public void putArray(final int[] invertedIndex, final int[] invertedIndexReverse,
    --- End diff --
   
    @kumarvishal09 yeah, you are right. Considering that scenario will make the code much more complex.
    As we all know, a maximum of string/bytearray is about 2GB. I think it is enough for current scenarios. 2GB can support average 67104 bytes per column (2GB/32000) in one column page.
   
    Having discussed with @jackylk , we have the following conclusions:
    1. Add an error message to indicate that the maximum size of one column page is 2GB.
    2. We can reduce the row size in the page to support longer characters, for example if the content is 67104*2, user can reduce the row size from 32000 to 16000.
   
    1 will be included in this PR and 2 will be implemented in another PR.
    Is that OK? @kumarvishal09


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5265/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6306/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5144/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    retest it please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6325/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5163/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6338/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5176/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5288/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2252: [CARBONDATA-2420][32K] Support string longer ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin closed the pull request at:

    https://github.com/apache/carbondata/pull/2252


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2252
 
    I deleted the origin branch unexpectedly,so I raised another PR #2379


---
1234