"between and" filter query is very slow

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

"between and" filter query is very slow

simafengyun
Hi Dev,


I used carbondata version 0.2 in my local machine, and found that the "between and" filter query is very slow.
the root caused is by the below code in IncludeFilterExecuterImpl.java. It takes about 20s in my test.
 The code's  time complexity is O(n*m). I think it needs to optimized, please confirm. thanks





  private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunkdimensionColumnDataChunk,

      intnumerOfRows) {

    BitSet bitSet = new BitSet(numerOfRows);

    if (dimensionColumnDataChunkinstanceof FixedLengthDimensionDataChunk) {

      FixedLengthDimensionDataChunk fixedDimensionChunk =

          (FixedLengthDimensionDataChunk) dimensionColumnDataChunk;

      byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();

     

      longstart = System.currentTimeMillis();

      for (intk = 0; k < filterValues.length; k++) {

        for (intj = 0; j < numerOfRows; j++) {

          if (ByteUtil.UnsafeComparer.INSTANCE

              .compareTo(fixedDimensionChunk.getCompleteDataChunk(), j * filterValues[k].length,

                  filterValues[k].length, filterValues[k], 0, filterValues[k].length) == 0) {

            bitSet.set(j);

          }

        }

      }

      System.out.println("loop time: "+(System.currentTimeMillis() - start));

    }




    returnbitSet;

  }
Reply | Threaded
Open this post in threaded view
|

Re: "between and" filter query is very slow

kumarvishal09
Hi,

Currently In include and exclude filter case when dimension column does not
have inverted index it is doing linear search , We can add binary search
when data for that column is sorted, to get this information we can check
in carbon table for that column whether user has selected no inverted index
or not. If user has selected No inverted index while creating a column this
code is fine, if user has not selected then data will be sorted so we can
add binary search which will improve the performance.

Please raise a Jira for this improvement

-Regards
Kumar Vishal


On Fri, Mar 3, 2017 at 7:42 PM, 马云 <[hidden email]> wrote:

> Hi Dev,
>
>
> I used carbondata version 0.2 in my local machine, and found that the
> "between and" filter query is very slow.
> the root caused is by the below code in IncludeFilterExecuterImpl.java.
> It takes about 20s in my test.
>  The code's  time complexity is O(n*m). I think it needs to optimized,
> please confirm. thanks
>
>
>
>
>
>   private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunkdimens
> ionColumnDataChunk,
>
>       intnumerOfRows) {
>
>     BitSet bitSet = new BitSet(numerOfRows);
>
>     if (dimensionColumnDataChunkinstanceof FixedLengthDimensionDataChunk)
> {
>
>       FixedLengthDimensionDataChunk fixedDimensionChunk =
>
>           (FixedLengthDimensionDataChunk) dimensionColumnDataChunk;
>
>       byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
>
>
>
>       longstart = System.currentTimeMillis();
>
>       for (intk = 0; k < filterValues.length; k++) {
>
>         for (intj = 0; j < numerOfRows; j++) {
>
>           if (ByteUtil.UnsafeComparer.INSTANCE
>
>               .compareTo(fixedDimensionChunk.getCompleteDataChunk(), j *
> filterValues[k].length,
>
>                   filterValues[k].length, filterValues[k], 0,
> filterValues[k].length) == 0) {
>
>             bitSet.set(j);
>
>           }
>
>         }
>
>       }
>
>       System.out.println("loop time: "+(System.currentTimeMillis() -
> start));
>
>     }
>
>
>
>
>     returnbitSet;
>
>   }
kumar vishal
Reply | Threaded
Open this post in threaded view
|

Re:Re: "between and" filter query is very slow

simafengyun


Hi Dev,


I have created the jira named CARBONDATA-748 a few days ago.
Today I have fixed it for version 0.2. And created a new pull request.
Please help to confirm. thanks











At 2017-03-03 20:47:51, "Kumar Vishal" <[hidden email]> wrote:

>Hi,
>
>Currently In include and exclude filter case when dimension column does not
>have inverted index it is doing linear search , We can add binary search
>when data for that column is sorted, to get this information we can check
>in carbon table for that column whether user has selected no inverted index
>or not. If user has selected No inverted index while creating a column this
>code is fine, if user has not selected then data will be sorted so we can
>add binary search which will improve the performance.
>
>Please raise a Jira for this improvement
>
>-Regards
>Kumar Vishal
>
>
>On Fri, Mar 3, 2017 at 7:42 PM, 马云 <[hidden email]> wrote:
>
>> Hi Dev,
>>
>>
>> I used carbondata version 0.2 in my local machine, and found that the
>> "between and" filter query is very slow.
>> the root caused is by the below code in IncludeFilterExecuterImpl.java.
>> It takes about 20s in my test.
>>  The code's  time complexity is O(n*m). I think it needs to optimized,
>> please confirm. thanks
>>
>>
>>
>>
>>
>>   private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunkdimens
>> ionColumnDataChunk,
>>
>>       intnumerOfRows) {
>>
>>     BitSet bitSet = new BitSet(numerOfRows);
>>
>>     if (dimensionColumnDataChunkinstanceof FixedLengthDimensionDataChunk)
>> {
>>
>>       FixedLengthDimensionDataChunk fixedDimensionChunk =
>>
>>           (FixedLengthDimensionDataChunk) dimensionColumnDataChunk;
>>
>>       byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
>>
>>
>>
>>       longstart = System.currentTimeMillis();
>>
>>       for (intk = 0; k < filterValues.length; k++) {
>>
>>         for (intj = 0; j < numerOfRows; j++) {
>>
>>           if (ByteUtil.UnsafeComparer.INSTANCE
>>
>>               .compareTo(fixedDimensionChunk.getCompleteDataChunk(), j *
>> filterValues[k].length,
>>
>>                   filterValues[k].length, filterValues[k], 0,
>> filterValues[k].length) == 0) {
>>
>>             bitSet.set(j);
>>
>>           }
>>
>>         }
>>
>>       }
>>
>>       System.out.println("loop time: "+(System.currentTimeMillis() -
>> start));
>>
>>     }
>>
>>
>>
>>
>>     returnbitSet;
>>
>>   }
Reply | Threaded
Open this post in threaded view
|

question about dimColumnExecuterInfo.getFilterKeys()

simafengyun
Hi  Dev,


when do filter query, I can see a filtered byte array.
Does filterValues always has order by the dictionary value?
If not, which case it has no order. thanks



 byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();




Reply | Threaded
Open this post in threaded view
|

please help for outofmemory issue in eclipse

simafengyun
Hi dev,


today I start setup carbon data 1.0 in my local eclipse
I use "-X -DskipTests -Pspark-1.6 -Dspark.version=1.6.2 clean package" to do maven build in eclipse successfully.
but when I run the CarbonExample in eclipse, it shows the below issue(refer to the below log).
Even I configure -Xmx10g -Xms10g, it also show the issue.


Can anyone help me? thanks



INFO  08-03 16:50:59,037 - Running Spark version 1.6.2

WARN  08-03 16:51:01,624 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

INFO  08-03 16:51:01,752 - Changing view acls to: mayun

INFO  08-03 16:51:01,753 - Changing modify acls to: mayun

INFO  08-03 16:51:01,754 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mayun); users with modify permissions: Set(mayun)

INFO  08-03 16:51:02,274 - Successfully started service 'sparkDriver' on port 51080.

INFO  08-03 16:51:02,609 - Slf4jLogger started

INFO  08-03 16:51:02,649 - Starting remoting

INFO  08-03 16:51:02,808 - Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.100.56.61:51081]

INFO  08-03 16:51:02,814 - Successfully started service 'sparkDriverActorSystem' on port 51081.

INFO  08-03 16:51:02,824 - Registering MapOutputTracker

INFO  08-03 16:51:02,844 - Registering BlockManagerMaster

INFO  08-03 16:51:02,857 - Created local directory at /private/var/folders/qg/b6zvdz3n1cggqx66yzc6m_s40000gn/T/blockmgr-85a89cab-9e48-4708-be89-cde6951285fe

INFO  08-03 16:51:02,870 - MemoryStore started with capacity 12.7 GB

INFO  08-03 16:51:02,926 - Registering OutputCommitCoordinator

INFO  08-03 16:51:03,072 - jetty-8.y.z-SNAPSHOT

INFO  08-03 16:51:03,118 - Started SelectChannelConnector@0.0.0.0:4040

INFO  08-03 16:51:03,118 - Successfully started service 'SparkUI' on port 4040.

INFO  08-03 16:51:03,121 - Started SparkUI at http://10.100.56.61:4040

INFO  08-03 16:51:03,212 - Starting executor ID driver on host localhost

INFO  08-03 16:51:03,228 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51082.

INFO  08-03 16:51:03,229 - Server created on 51082

INFO  08-03 16:51:03,230 - Trying to register BlockManager

INFO  08-03 16:51:03,233 - Registering block manager localhost:51082 with 12.7 GB RAM, BlockManagerId(driver, localhost, 51082)

INFO  08-03 16:51:03,234 - Registered BlockManager

Starting CarbonExample using spark version 1.6.2

Exception in thread "main"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"








Reply | Threaded
Open this post in threaded view
|

Re:please help for outofmemory issue in eclipse

simafengyun
please ignore my issue.
 I change JDK from 1.8 to 1.7 and add the below, it runs successfully now.


-Xmx3550m -Xms3550m -XX:MaxPermSize=512m










At 2017-03-08 17:20:58, "马云" <[hidden email]> wrote:

Hi dev,


today I start setup carbon data 1.0 in my local eclipse
I use "-X -DskipTests -Pspark-1.6 -Dspark.version=1.6.2 clean package" to do maven build in eclipse successfully.
but when I run the CarbonExample in eclipse, it shows the below issue(refer to the below log).
Even I configure -Xmx10g -Xms10g, it also show the issue.


Can anyone help me? thanks



INFO  08-03 16:50:59,037 - Running Spark version 1.6.2

WARN  08-03 16:51:01,624 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

INFO  08-03 16:51:01,752 - Changing view acls to: mayun

INFO  08-03 16:51:01,753 - Changing modify acls to: mayun

INFO  08-03 16:51:01,754 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mayun); users with modify permissions: Set(mayun)

INFO  08-03 16:51:02,274 - Successfully started service 'sparkDriver' on port 51080.

INFO  08-03 16:51:02,609 - Slf4jLogger started

INFO  08-03 16:51:02,649 - Starting remoting

INFO  08-03 16:51:02,808 - Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.100.56.61:51081]

INFO  08-03 16:51:02,814 - Successfully started service 'sparkDriverActorSystem' on port 51081.

INFO  08-03 16:51:02,824 - Registering MapOutputTracker

INFO  08-03 16:51:02,844 - Registering BlockManagerMaster

INFO  08-03 16:51:02,857 - Created local directory at /private/var/folders/qg/b6zvdz3n1cggqx66yzc6m_s40000gn/T/blockmgr-85a89cab-9e48-4708-be89-cde6951285fe

INFO  08-03 16:51:02,870 - MemoryStore started with capacity 12.7 GB

INFO  08-03 16:51:02,926 - Registering OutputCommitCoordinator

INFO  08-03 16:51:03,072 - jetty-8.y.z-SNAPSHOT

INFO  08-03 16:51:03,118 - Started SelectChannelConnector@0.0.0.0:4040

INFO  08-03 16:51:03,118 - Successfully started service 'SparkUI' on port 4040.

INFO  08-03 16:51:03,121 - Started SparkUI at http://10.100.56.61:4040

INFO  08-03 16:51:03,212 - Starting executor ID driver on host localhost

INFO  08-03 16:51:03,228 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51082.

INFO  08-03 16:51:03,229 - Server created on 51082

INFO  08-03 16:51:03,230 - Trying to register BlockManager

INFO  08-03 16:51:03,233 - Registering block manager localhost:51082 with 12.7 GB RAM, BlockManagerId(driver, localhost, 51082)

INFO  08-03 16:51:03,234 - Registered BlockManager

Starting CarbonExample using spark version 1.6.2

Exception in thread "main"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"













 
Reply | Threaded
Open this post in threaded view
|

Re: question about dimColumnExecuterInfo.getFilterKeys()

ravipesala
In reply to this post by simafengyun
Hi,

The filter values which we get from query will be converted to respective
surrogates and sorted on surrogate values before start applying the filter.


Regards,
Ravindra

On 8 March 2017 at 09:55, 马云 <[hidden email]> wrote:

> Hi  Dev,
>
>
> when do filter query, I can see a filtered byte array.
> Does filterValues always has order by the dictionary value?
> If not, which case it has no order. thanks
>
>
>
>  byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
>
>
>
>
>


--
Thanks & Regards,
Ravi
Reply | Threaded
Open this post in threaded view
|

Re:Re: question about dimColumnExecuterInfo.getFilterKeys()

simafengyun


thanks









At 2017-03-09 11:09:59, "Ravindra Pesala" <[hidden email]> wrote:

>Hi,
>
>The filter values which we get from query will be converted to respective
>surrogates and sorted on surrogate values before start applying the filter.
>
>
>Regards,
>Ravindra
>
>On 8 March 2017 at 09:55, 马云 <[hidden email]> wrote:
>
>> Hi  Dev,
>>
>>
>> when do filter query, I can see a filtered byte array.
>> Does filterValues always has order by the dictionary value?
>> If not, which case it has no order. thanks
>>
>>
>>
>>  byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
>>
>>
>>
>>
>>
>
>
>--
>Thanks & Regards,
>Ravi