Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2252: WIP: Support string longer than 32000 charact...

Classic

List

80 messages Options

Options

1234

[GitHub] carbondata issue #2252: [CARBONDATA-2420] Support string longer than 32000 c...

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2252

@ravipesala @kumarvishal09 @jackylk
Based on @ravipesala 's advice, I changed the name of internal datatype from `text` to `varchar`

Later I'll raise another PR to make it possible for user to directly use `varchar(size)` in DDL statement.

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420] Support string longer than 32000 c...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2252

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6259/

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420] Support string longer than 32000 c...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2252

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5097/

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420] Support string longer than 32000 c...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2252

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5229/

---

[GitHub] carbondata pull request #2252: [CARBONDATA-2420] Support string longer than ...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2252#discussion_r194350793

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/VariableLengthDimensionColumnPage.java ---
@@ -30,21 +30,19 @@

/**
* Constructor for this class
- * @param dataChunks
- * @param invertedIndex
- * @param invertedIndexReverse
- * @param numberOfRows
*/
public VariableLengthDimensionColumnPage(byte[] dataChunks, int[] invertedIndex,
- int[] invertedIndexReverse, int numberOfRows) {
+ int[] invertedIndexReverse, int numberOfRows, boolean isVarcharEncoded) {
--- End diff --

Its better to pass store type it self from the caller instead for boolean

---

[GitHub] carbondata pull request #2252: [CARBONDATA-2420] Support string longer than ...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2252#discussion_r194479546

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java ---
@@ -56,7 +57,8 @@ public SafeVariableLengthDimensionDataChunkStore(boolean isInvertedIndex, int nu
* @param invertedIndexReverse inverted index reverse to be stored
* @param data data to be stored
*/
- @Override public void putArray(final int[] invertedIndex, final int[] invertedIndexReverse,
+ @Override
+ public void putArray(final int[] invertedIndex, final int[] invertedIndexReverse,
--- End diff --

@xuchuanyin In Case of varchar column I think we will not be able to store complete data in single byte array as it may overflow, in case of short it is fine as maximum it will be of 32000(max number of bytes in each value) * 32000(number of records in page) + 64000(length for each value in a page), so it will be 1024064000 value and as maximum u can create an array of integer max value so it is fine, but in varchar case length of each value can be maximum of Integer max it self so we will not be able to store in single byte array. This handling is required in both data loading and query.

---

[GitHub] carbondata pull request #2252: [CARBONDATA-2420] Support string longer than ...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2252#discussion_r194597446

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/VariableLengthDimensionColumnPage.java ---
@@ -30,21 +30,19 @@

/**
* Constructor for this class
- * @param dataChunks
- * @param invertedIndex
- * @param invertedIndexReverse
- * @param numberOfRows
*/
public VariableLengthDimensionColumnPage(byte[] dataChunks, int[] invertedIndex,
- int[] invertedIndexReverse, int numberOfRows) {
+ int[] invertedIndexReverse, int numberOfRows, boolean isVarcharEncoded) {
--- End diff --

OK~

---

[GitHub] carbondata pull request #2252: [CARBONDATA-2420] Support string longer than ...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2252#discussion_r194603907

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java ---
@@ -56,7 +57,8 @@ public SafeVariableLengthDimensionDataChunkStore(boolean isInvertedIndex, int nu
* @param invertedIndexReverse inverted index reverse to be stored
* @param data data to be stored
*/
- @Override public void putArray(final int[] invertedIndex, final int[] invertedIndexReverse,
+ @Override
+ public void putArray(final int[] invertedIndex, final int[] invertedIndexReverse,
--- End diff --

@kumarvishal09 yeah, you are right. Considering that scenario will make the code much more complex.
As we all know, a maximum of string/bytearray is about 2GB. I think it is enough for current scenarios. 2GB can support average 67104 bytes per column (2GB/32000) in one column page.

Having discussed with @jackylk , we have the following conclusions:
1. Add an error message to indicate that the maximum size of one column page is 2GB.
2. We can reduce the row size in the page to support longer characters, for example if the content is 67104*2, user can reduce the row size from 32000 to 16000.

1 will be included in this PR and 2 will be implemented in another PR.
Is that OK? @kumarvishal09

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2252

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5265/

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2252

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6306/

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2252

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5144/

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2252

retest it please

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2252

retest this please

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2252

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6325/

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2252

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5163/

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2252

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6338/

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2252

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5176/

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2252

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5288/

---

[GitHub] carbondata pull request #2252: [CARBONDATA-2420][32K] Support string longer ...

In reply to this post by qiuchenjian-2

Github user xuchuanyin closed the pull request at:

https://github.com/apache/carbondata/pull/2252

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420][32K] Support string longer than 32...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2252

I deleted the origin branch unexpectedly,so I raised another PR #2379

---

1234