Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #952: [CARBONDATA-1094] Wrong results returned by th...

Classic

List

Threaded

12 messages Options

qiuchenjian-2

[GitHub] carbondata pull request #952: [CARBONDATA-1094] Wrong results returned by th...

GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/952

[CARBONDATA-1094] Wrong results returned by the query in case inverted index is not created on a column

Problem: Wrong results returned by the query in case inverted index is not created on a column

Fix: When inverted index does not exist for a column or column is not a sort column then
1. Block or blocklet cannot be pruned as data for that column is not sorted
2. While applying the filter linear search should be applied instead of binary search as binary search can be applied only on sorted data

Verified result
------------------
SELECT * FROM index1 WHERE city >= 'Shanghai'
+---+-----+----------+
| id| name| city|
+---+-----+----------+
| 11|James|Washington|
| 20|Kevin| Singapore|
| 9| Mary| Tokyo|
| 16| Paul| Shanghai|
| 4| Sara| Tokyo|
+---+-----+----------+

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/incubator-carbondata inverted_index_filter_issue_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/952.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #952

----
commit 87d7bb2bd74569905f5c24e9cb91735df8ac27da
Author: manishgupta88 <[hidden email]>
Date: 2017-05-25T13:14:30Z

Problem: Wrong results returned by the query in case inverted index is not created on a column

Fix: When inverted index does not exist for a column or column is not a sort column then
1. Block or blocklet cannot be pruned as data for that column is not sorted
2. While applying the filter linear search should be applied instead of binary search as binary search can be applied only on sorted data

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

qiuchenjian-2

[GitHub] carbondata pull request #952: [CARBONDATA-1094] Wrong results returned by th...

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/952#discussion_r118632591

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java ---
@@ -137,19 +150,31 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD
int numerOfRows) {
BitSet bitSet = new BitSet(numerOfRows);
if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
+ int startIndex = 0;
byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
- if (filterValues.length > 1) {
--- End diff --

Any reason for removing this code. Its an optimization added for unsorted chunk data. As filter values are sorted we can do reverse comparison.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

qiuchenjian-2