Hi,
The performance is depends on the query plan, when you submit the query
like [Select attributeA , count(*) from tableB group by attributeA] in
case of spark it asks carbon to give only attributeA column. So Carbon
reads only attributeA column from all files send the result to spark to
aggregate data.
In my laptop with 4 cores test, Spark2 with carbon with store size of
100million records could get result in 11 seconds for query like above. In
good machines this may get very faster.
Regards,
Ravindra.
On 8 February 2017 at 11:49, ffpeng90 <
[hidden email]> wrote:
> Hi,all:
> Recently, I create two tables as ORC and Carbondata. All of them
> contain
> one hundred million records.
> Then I submit aggregate querys to presto like : [Select count(*) from
> tableB where attributeA = 'xxx'],
> carbon performs better than orc.
>
> However, when i submit querys like: [Select attributeA , count(*) from
> tableB group by attributeA], the performace of carbon is bad. Obviously
> this query will result-in a full scan, so QueryModel need to rebuild all
> records with columns related. This step need a lot of time.
>
> So i want to know is there any optimize techniques for this kind of
> problems
> in spark?
>
>
>
> --
> View this message in context:
http://apache-carbondata-> mailing-list-archive.1130556.n5.nabble.com/Aggregate-
> performace-tp7440.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>
--
Thanks & Regards,
Ravi