Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Updated] (CARBONDATA-748) "between and" filter query is very slow

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Updated] (CARBONDATA-748) "between and" filter query is very slow

[ https://issues.apache.org/jira/browse/CARBONDATA-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jarck updated CARBONDATA-748:
-----------------------------

thanks for your quick response and suggestion.

But if don't change setFilterdIndexToBitSet.
what about the filter on the first dimension column?
currently in this case it will run the method setFilterdIndexToBitSet and cause the query very slow.

maybe also need to change the logic when filter on the first dimension column,
let it run the method setFilterdIndexToBitSetWithColumnIndex, does it ok?

> "between and" filter query is very slow
> ---------------------------------------
>
> Key: CARBONDATA-748
> URL: https://issues.apache.org/jira/browse/CARBONDATA-748
> Project: CarbonData
> Issue Type: Improvement
> Reporter: Jarck
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> Hi,
> Currently In include and exclude filter case when dimension column does not
> have inverted index it is doing linear search , We can add binary search
> when data for that column is sorted, to get this information we can check
> in carbon table for that column whether user has selected no inverted index
> or not. If user has selected No inverted index while creating a column this
> code is fine, if user has not selected then data will be sorted so we can
> add binary search which will improve the performance.
> Please raise a Jira for this improvement
> -Regards
> Kumar Vishal
> On Fri, Mar 3, 2017 at 7:42 PM, 马云 <[hidden email]> wrote:
> Hi Dev,
> I used carbondata version 0.2 in my local machine, and found that the
> "between and" filter query is very slow.
> the root caused is by the below code in IncludeFilterExecuterImpl.java.
> It takes about 20s in my test.
> The code's time complexity is O(n*m). I think it needs to optimized,
> please confirm. thanks
> private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunkdimens
> ionColumnDataChunk,
> intnumerOfRows) {
> BitSet bitSet = new BitSet(numerOfRows);
> if (dimensionColumnDataChunkinstanceof FixedLengthDimensionDataChunk)
> {
> FixedLengthDimensionDataChunk fixedDimensionChunk =
> (FixedLengthDimensionDataChunk) dimensionColumnDataChunk;
> byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
> longstart = System.currentTimeMillis();
> for (intk = 0; k < filterValues.length; k++) {
> for (intj = 0; j < numerOfRows; j++) {
> if (ByteUtil.UnsafeComparer.INSTANCE
> .compareTo(fixedDimensionChunk.getCompleteDataChunk(), j *
> filterValues[k].length,
> filterValues[k].length, filterValues[k], 0,
> filterValues[k].length) == 0) {
> bitSet.set(j);
> }
> }
> }
> System.out.println("loop time: "+(System.currentTimeMillis() -
> start));
> }

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)