Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2784: [CARBONDATA-2987] Data mismatch after compact...

Classic

List

17 messages Options

Options

[GitHub] carbondata pull request #2784: [CARBONDATA-2987] Data mismatch after compact...

GitHub user ajantha-bhat opened a pull request:

https://github.com/apache/carbondata/pull/2784

[CARBONDATA-2987] Data mismatch after compaction with measure sort columns

problem: Data mismatch after compaction with measure sort columns

root cause : In compaction flow (DictionaryBasedResultCollector), in ColumnPageWrapper inverted index mapping is not handled. Because of this row ID was wrong, row of no dictionary dimension columns gets data from other rows.
Hence the data mismatch.

solution: Handle inverted index mapping for DictionaryBasedResultCollector flow in ColumnPageWrapper

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed? No

- [ ] Any backward compatibility impacted? No

- [ ] Document update required?NA

- [ ] Testing done
done. updated UT
- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajantha-bhat/carbondata master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2784.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2784

----
commit 63888753f72d7c6b4d993b4e31f3c5a8b7d449f8
Author: ajantha-bhat <ajanthabhat@...>
Date: 2018-09-28T10:57:55Z

[CARBONDATA-2987] Data mismatch after compaction with measure sort columns

----

---

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2784

Is this issue also exist in older version like 1.3 and 1.4?

---

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2784

@jackylk : No this issue introduce only in this version (1.5) after adaptive encoding of primitive type changes.

Previously column page wrapper was only for complex columns, this version it is used for no dictionary columns also. In vector reader case (DictionaryBasedVectorResultCollector) inverted index was handled for no-dictionary columns here but for DictionaryBasedResultCollector it was not handled here. Hence the issue.

---

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2784

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/630/

---

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2784

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/822/

---

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2784

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8891/

---

[GitHub] carbondata pull request #2784: [CARBONDATA-2987] Data mismatch after compact...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2784#discussion_r221258211

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java ---
@@ -302,8 +311,19 @@ public boolean isExplicitSorted() {

@Override
public int compareTo(int rowId, byte[] compareValue) {
- byte[] chunkData = this.getChunkData((int) rowId);
- return ByteUtil.UnsafeComparer.INSTANCE.compareTo(chunkData, compareValue);
+ // rowId is the inverted index, but the null bitset is based on actual data
+ int nullBitSetRowId = rowId;
+ if (isExplicitSorted()) {
+ nullBitSetRowId = getInvertedReverseIndex(rowId);
+ }
+ byte[] nullBitSet = getNullBitSet(nullBitSetRowId, columnPage.getColumnSpec().getColumnType());
--- End diff --

why we need to handle inverted index for nullbitset, i think it is not required

---

[GitHub] carbondata pull request #2784: [CARBONDATA-2987] Data mismatch after compact...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2784#discussion_r221261712

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java ---
@@ -302,8 +311,19 @@ public boolean isExplicitSorted() {

@Override
public int compareTo(int rowId, byte[] compareValue) {
- byte[] chunkData = this.getChunkData((int) rowId);
- return ByteUtil.UnsafeComparer.INSTANCE.compareTo(chunkData, compareValue);
+ // rowId is the inverted index, but the null bitset is based on actual data
+ int nullBitSetRowId = rowId;
+ if (isExplicitSorted()) {
+ nullBitSetRowId = getInvertedReverseIndex(rowId);
+ }
+ byte[] nullBitSet = getNullBitSet(nullBitSetRowId, columnPage.getColumnSpec().getColumnType());
--- End diff --

because this rowId is not the original rowId. Hence it is required.

---

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2784

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8899/

---

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2784

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/640/

---

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2784

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/832/

---

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2784

@ravipesala : PR is ready please check

---

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2784

retest this please

---

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2784

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/673/

---

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2784

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/869/

---

[GitHub] carbondata issue #2784: [CARBONDATA-2987] Data mismatch after compaction wit...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2784

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8937/

---

[GitHub] carbondata pull request #2784: [CARBONDATA-2987] Data mismatch after compact...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2784

---