Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2654: [WIP] Adaptive Encoding for Primitive data ty...

Classic

List

Threaded

193 messages Options

1 ... 78910

qiuchenjian-2

[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2654

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/482/

---

qiuchenjian-2

[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2654

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/330/

---

qiuchenjian-2

[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2654

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/507/

---

qiuchenjian-2

[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2654

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8577/

---

qiuchenjian-2

[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2654

LGTM

---

qiuchenjian-2

[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2654

---

qiuchenjian-2

[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2654

Seriously? Have you checked this PR on legacy store? @kevinjmh tested in local days ago and raised this problem but didn't get any feedback.

---

qiuchenjian-2

[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

In reply to this post by qiuchenjian-2

Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2654

I ran a test on table with bloom datamap created before applying this PR, and query it after this PR merged, but the answer is not correct. Can you check it?

Procedure to reproduce:

- switch master code before this PR merged
- create table with no-dict measure column (set the measure column as sort column)
- create bloom datamap on the measure column
- load some data into table
- query on the measure column, get a result
- switch to code after this PR merged
- do the same query and compare the result

---

qiuchenjian-2

[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

In reply to this post by qiuchenjian-2

Github user dhatchayani commented on the issue:

https://github.com/apache/carbondata/pull/2654

> @dhatchayani What about the legacy store?
> For example, for the the non-dict-primitive column, in old store in BloomFilter datamap, it stores the bytes and during query we will convert it to bytes, but in the new store during query we will convert it to primitive object, which will cause mismatch.

In the legacy store it is stored as bytes, in the new store it is stored as primitive object, but while retrieving back from the query the query result is unified to bytes only

---

qiuchenjian-2

[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

In reply to this post by qiuchenjian-2

Github user dhatchayani commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r218669311

--- Diff: integration/spark2/src/main/scala/org/apache/carbondata/datamap/IndexDataMapRebuildRDD.scala ---
@@ -264,8 +264,17 @@ class RawBytesReadSupport(segmentProperties: SegmentProperties, indexColumns: Ar
rtn(i) = if (indexCol2IdxInDictArray.contains(col.getColName)) {
surrogatKeys(indexCol2IdxInDictArray(col.getColName)).toInt.asInstanceOf[Integer]
} else if (indexCol2IdxInNoDictArray.contains(col.getColName)) {
- data(0).asInstanceOf[ByteArrayWrapper].getNoDictionaryKeyByIndex(
+ val bytes = data(0).asInstanceOf[ByteArrayWrapper].getNoDictionaryKeyByIndex(
indexCol2IdxInNoDictArray(col.getColName))
+ // no dictionary primitive columns are expected to be in original data while loading,
+ // so convert it to original data
+ if (DataTypeUtil.isPrimitiveColumn(col.getDataType)) {
+ val dataFromBytes = DataTypeUtil
+ .getDataBasedOnDataTypeForNoDictionaryColumn(bytes, col.getDataType)
+ dataFromBytes
--- End diff --

i think measure null and no dictionary null values are different, can u please give me any scenario which fall into no dictionary null case?

---

qiuchenjian-2

[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

In reply to this post by qiuchenjian-2

Github user dhatchayani commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r218669857

--- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java ---
@@ -331,8 +332,18 @@ private BloomQueryModel buildQueryModelInternal(CarbonColumn carbonColumn,
// for dictionary/date columns, convert the surrogate key to bytes
internalFilterValue = CarbonUtil.getValueAsBytes(DataTypes.INT, convertedValue);
} else {
- // for non dictionary dimensions, is already bytes,
- internalFilterValue = (byte[]) convertedValue;
+ // for non dictionary dimensions, numeric columns will be of original data,
+ // so convert the data to bytes
+ if (DataTypeUtil.isPrimitiveColumn(carbonColumn.getDataType())) {
+ if (convertedValue == null) {
+ convertedValue = DataConvertUtil.getNullValueForMeasure(carbonColumn.getDataType(),
+ carbonColumn.getColumnSchema().getScale());
+ }
+ internalFilterValue =
+ CarbonUtil.getValueAsBytes(carbonColumn.getDataType(), convertedValue);
--- End diff --

> I ran a test on table with bloom datamap created before applying this PR, and query it after this PR merged, but the answer is not correct. Can you check it?
>
> Procedure to reproduce:
>
> * switch master code before this PR merged
> * create table with no-dict measure column (set the measure column as sort column)
> * create bloom datamap on the measure column
> * load some data into table
> * query on the measure column, get a result
> * switch to code after this PR merged
> * do the same query and compare the result

I will check this issue and update asap

---

qiuchenjian-2

[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

In reply to this post by qiuchenjian-2

Github user dhatchayani commented on the issue:

https://github.com/apache/carbondata/pull/2654

> I ran a test on table with bloom datamap created before applying this PR, and query it after this PR merged, but the answer is not correct. Can you check it?
>
> Procedure to reproduce:
>
> * switch master code before this PR merged
> * create table with no-dict measure column (set the measure column as sort column)
> * create bloom datamap on the measure column
> * load some data into table
> * query on the measure column, get a result
> * switch to code after this PR merged
> * do the same query and compare the result

@kevinjmh Issue is reproduced and this is the issue with compatibility because of the data written in new store is of different format. That i will correct it in the next PR.

---

qiuchenjian-2

[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

In reply to this post by qiuchenjian-2

Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2654

OK

---

1 ... 78910