Login  Register

Aggregate performace

Posted by ffpeng90 on Feb 08, 2017; 6:19am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Aggregate-performace-tp7440.html

Hi,all:
   Recently, I create two tables as ORC and Carbondata.  All of them contain one hundred million records.
Then I submit aggregate querys to presto like : [Select  count(*)  from tableB where attributeA = 'xxx'],
carbon performs better than orc.

However,  when i submit querys like: [Select attributeA , count(*)  from tableB group by attributeA],  the performace of carbon is bad. Obviously this query will result-in a full scan,  so QueryModel need to rebuild all records with columns related. This step need a lot of time.

So i want to know is there any optimize techniques for this kind of problems in spark?