Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Resolved] (CARBONDATA-748) "between and" filter query is very slow

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Resolved] (CARBONDATA-748) "between and" filter query is very slow

[ https://issues.apache.org/jira/browse/CARBONDATA-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ravindra Pesala resolved CARBONDATA-748.
----------------------------------------
Resolution: Fixed
Assignee: Jarck
Fix Version/s: 1.0.1-incubating

> "between and" filter query is very slow
> ---------------------------------------
>
> Key: CARBONDATA-748
> URL: https://issues.apache.org/jira/browse/CARBONDATA-748
> Project: CarbonData
> Issue Type: Improvement
> Reporter: Jarck
> Assignee: Jarck
> Fix For: 1.0.1-incubating
>
> Time Spent: 3h 20m
> Remaining Estimate: 0h
>
> Hi,
> Currently In include and exclude filter case when dimension column does not
> have inverted index it is doing linear search , We can add binary search
> when data for that column is sorted, to get this information we can check
> in carbon table for that column whether user has selected no inverted index
> or not. If user has selected No inverted index while creating a column this
> code is fine, if user has not selected then data will be sorted so we can
> add binary search which will improve the performance.
> Please raise a Jira for this improvement
> -Regards
> Kumar Vishal
> On Fri, Mar 3, 2017 at 7:42 PM, 马云 <[hidden email]> wrote:
> Hi Dev,
> I used carbondata version 0.2 in my local machine, and found that the
> "between and" filter query is very slow.
> the root caused is by the below code in IncludeFilterExecuterImpl.java.
> It takes about 20s in my test.
> The code's time complexity is O(n*m). I think it needs to optimized,
> please confirm. thanks
> private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunkdimens
> ionColumnDataChunk,
> intnumerOfRows) {
> BitSet bitSet = new BitSet(numerOfRows);
> if (dimensionColumnDataChunkinstanceof FixedLengthDimensionDataChunk)
> {
> FixedLengthDimensionDataChunk fixedDimensionChunk =
> (FixedLengthDimensionDataChunk) dimensionColumnDataChunk;
> byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
> longstart = System.currentTimeMillis();
> for (intk = 0; k < filterValues.length; k++) {
> for (intj = 0; j < numerOfRows; j++) {
> if (ByteUtil.UnsafeComparer.INSTANCE
> .compareTo(fixedDimensionChunk.getCompleteDataChunk(), j *
> filterValues[k].length,
> filterValues[k].length, filterValues[k], 0,
> filterValues[k].length) == 0) {
> bitSet.set(j);
> }
> }
> }
> System.out.println("loop time: "+(System.currentTimeMillis() -
> start));
> }

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)