Discussion about getting excution duration about a query when using sparkshell+carbondata

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Discussion about getting excution duration about a query when using sparkshell+carbondata

李寅威
Hi all,


  When we are using sparkshell + carbondata to send a query, how can we get the excution duration? Some topics are thrown as follows:


  1. One query can produce one or more jobs, and some of the jobs may have DAG dependence, thus we can't get the excution duration by sum up all the jobs' duration or get the max duration of the jobs roughly.


  2. In the spark shell console or spark application web ui, we can get each job's duration, but we can't get the carbondata-query directly, if some improvement would take by carbondata in the near future.


  3. Maybe we can use the following command to get a approximate result:


    scala > val begin = new Date();cc.sql("$SQL_COMMAND").show;val end = new Date();


  Any other opinions?
Reply | Threaded
Open this post in threaded view
|

Re: Discussion about getting excution duration about a query when using sparkshell+carbondata

Liang Chen
Administrator
Hi

I used the below method in spark shell for DEMO, for your reference:

import org.apache.spark.sql.catalyst.util._

benchmark { carbondf.filter($"name" === "Allen" and $"gender" === "Male"
and $"province" === "NB" and $"singler" === "false").count }


Regards

Liang

2017-02-06 22:07 GMT-05:00 Yinwei Li <[hidden email]>:

> Hi all,
>
>
>   When we are using sparkshell + carbondata to send a query, how can we
> get the excution duration? Some topics are thrown as follows:
>
>
>   1. One query can produce one or more jobs, and some of the jobs may have
> DAG dependence, thus we can't get the excution duration by sum up all the
> jobs' duration or get the max duration of the jobs roughly.
>
>
>   2. In the spark shell console or spark application web ui, we can get
> each job's duration, but we can't get the carbondata-query directly, if
> some improvement would take by carbondata in the near future.
>
>
>   3. Maybe we can use the following command to get a approximate result:
>
>
>     scala > val begin = new Date();cc.sql("$SQL_COMMAND").show;val end =
> new Date();
>
>
>   Any other opinions?




--
Regards
Liang
Reply | Threaded
Open this post in threaded view
|

Re: Discussion about getting excution duration about a query when using sparkshell+carbondata

范范欣欣
Hi

Now i can use carbondata 1.0.0 with spark-shell(spark 2.1) as:

./bin/spark-shell --jars <carbondata assembly jar path>

but it's inconvenient to get the query time , so i try to use
./bin/spark-sql --jars  <carbondata assembly jar path>,but i found some
errors when create table :

spark-sql> create table if not exists test_table(id string, name string,
city string, age int) stored by 'carbondata';
Error in query:
Operation not allowed:STORED BY(line 1, pos 87)

it seems that the carbondata jar is not load successfully. How can i use
./bin/spark-sql?

Regards

Libis



2017-02-07 13:16 GMT+08:00 Liang Chen <[hidden email]>:

> Hi
>
> I used the below method in spark shell for DEMO, for your reference:
>
> import org.apache.spark.sql.catalyst.util._
>
> benchmark { carbondf.filter($"name" === "Allen" and $"gender" === "Male"
> and $"province" === "NB" and $"singler" === "false").count }
>
>
> Regards
>
> Liang
>
> 2017-02-06 22:07 GMT-05:00 Yinwei Li <[hidden email]>:
>
> > Hi all,
> >
> >
> >   When we are using sparkshell + carbondata to send a query, how can we
> > get the excution duration? Some topics are thrown as follows:
> >
> >
> >   1. One query can produce one or more jobs, and some of the jobs may
> have
> > DAG dependence, thus we can't get the excution duration by sum up all the
> > jobs' duration or get the max duration of the jobs roughly.
> >
> >
> >   2. In the spark shell console or spark application web ui, we can get
> > each job's duration, but we can't get the carbondata-query directly, if
> > some improvement would take by carbondata in the near future.
> >
> >
> >   3. Maybe we can use the following command to get a approximate result:
> >
> >
> >     scala > val begin = new Date();cc.sql("$SQL_COMMAND").show;val end =
> > new Date();
> >
> >
> >   Any other opinions?
>
>
>
>
> --
> Regards
> Liang
>
Reply | Threaded
Open this post in threaded view
|

Re: Discussion about getting excution duration about a query when using sparkshell+carbondata

ravipesala
Hi Libis,

spark-sql CLI is not supported by carbondata.
Why don't you use carbon thrift server and beeline, it is also same as
spark-sql CLI and it gives execution time for each query.

Start carbondata thrift server script.
bin/spark-submit --class
org.apache.carbondata.spark.thriftserver.CarbonThriftServer  <carbondata
jar file> <store-location>

beeline script
bin/beeline -u jdbc:hive2://localhost:10000

Regards,
Ravindra

On 9 February 2017 at 07:55, 范范欣欣 <[hidden email]> wrote:

> Hi
>
> Now i can use carbondata 1.0.0 with spark-shell(spark 2.1) as:
>
> ./bin/spark-shell --jars <carbondata assembly jar path>
>
> but it's inconvenient to get the query time , so i try to use
> ./bin/spark-sql --jars  <carbondata assembly jar path>,but i found some
> errors when create table :
>
> spark-sql> create table if not exists test_table(id string, name string,
> city string, age int) stored by 'carbondata';
> Error in query:
> Operation not allowed:STORED BY(line 1, pos 87)
>
> it seems that the carbondata jar is not load successfully. How can i use
> ./bin/spark-sql?
>
> Regards
>
> Libis
>
>
>
> 2017-02-07 13:16 GMT+08:00 Liang Chen <[hidden email]>:
>
> > Hi
> >
> > I used the below method in spark shell for DEMO, for your reference:
> >
> > import org.apache.spark.sql.catalyst.util._
> >
> > benchmark { carbondf.filter($"name" === "Allen" and $"gender" === "Male"
> > and $"province" === "NB" and $"singler" === "false").count }
> >
> >
> > Regards
> >
> > Liang
> >
> > 2017-02-06 22:07 GMT-05:00 Yinwei Li <[hidden email]>:
> >
> > > Hi all,
> > >
> > >
> > >   When we are using sparkshell + carbondata to send a query, how can we
> > > get the excution duration? Some topics are thrown as follows:
> > >
> > >
> > >   1. One query can produce one or more jobs, and some of the jobs may
> > have
> > > DAG dependence, thus we can't get the excution duration by sum up all
> the
> > > jobs' duration or get the max duration of the jobs roughly.
> > >
> > >
> > >   2. In the spark shell console or spark application web ui, we can get
> > > each job's duration, but we can't get the carbondata-query directly, if
> > > some improvement would take by carbondata in the near future.
> > >
> > >
> > >   3. Maybe we can use the following command to get a approximate
> result:
> > >
> > >
> > >     scala > val begin = new Date();cc.sql("$SQL_COMMAND").show;val
> end =
> > > new Date();
> > >
> > >
> > >   Any other opinions?
> >
> >
> >
> >
> > --
> > Regards
> > Liang
> >
>



--
Thanks & Regards,
Ravi