[DISCUSSION] support user specified segment reading for query

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSSION] support user specified segment reading for query

rahul_kumar
Hi All,

Please find the design doc for segment reading as attachment.

Proposed Solution :
  1. A new property will introduce to set the segment no.
  2. User will set property(carbon.input.segments. <database_name> .<table_name>) to specify segment no.
  3. During CarbonScan data will be read from from specified segments only.
  4. If property is not set, all segments will be caonsidered as default behavior.

          Thanks and Regards
               Rahul Kumar     

     


segmentReading.odt (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] support user specified segment reading for query

Jacky Li
I have 2 doubts:
1. If user uses following command in two different beeline session, will there be problem due to multithreading?
SET carbon.input.segments.default.carbontable=1,3,5;
select * from carbontable;
SET carbon.input.segments.default.carbontable=*;


2. The RESET command is not clear, why this is needed? It seems SET  carbon.input.segments.default.carbontable=* is enough, right? and what parameter it has?

Regards,
Jacky

> 在 2017年10月4日,上午12:42,Rahul Kumar <[hidden email]> 写道:
>
> <segmentReading.odt>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] support user specified segment reading for query

rahul_kumar
@Jacky please find the reply of your doubts as follow :


*1.    If user uses following command in two different beeline session,
will there be problem due to multithreading?       SET
carbon.input.segments.default.*

*carbontable=1,3,5;       select * from carbontable;       SET
carbon.input.segments.default.**carbontable=*;*

*Ans: *In case of multithreading ,yes there will be problem.

      So threadSet() can be use to set the same property in multithread
mode.
*      Folowing syntax can be used to set segment ids for multithread mode*
:
   Syntax : CarbonSession.threadSet(“carbon.input.segments.<databese_name>.<table_name>”,”<list
of segment ids>”)
   e.g =>*future{*

* CarbonSession.threadSet(“**carbon.input.segments.
default.carbontable”,”1,3,5”)*

* sparkSession.sql(“select * from carbontable”).show*

* CarbonSession.threadSet(“carbon.input.segments. default.carbontable”,”*”)*

* }*

*Above will override the property at thread level. So property will be set
for each thread .*


*2.   The RESET command is not clear, why this is needed? It seems SET
carbon.input.segments.default.**carbontable=* is enough, right? and what
parameter it has?*

*Ans:* RESET command doesn't take any parameter. RESET is already
implemented behavior which resets all the properties to their default
value.So simillarly RESET query will set the above property also to its
default value.

          Thanks and Regards

*   Rahul Kumar     *



On Wed, Oct 4, 2017 at 7:21 PM, Jacky Li <[hidden email]> wrote:

> I have 2 doubts:
> 1. If user uses following command in two different beeline session, will
> there be problem due to multithreading?
> SET carbon.input.segments.default.carbontable=1,3,5;
> select * from carbontable;
> SET carbon.input.segments.default.carbontable=*;
>
>
> 2. The RESET command is not clear, why this is needed? It seems SET
> carbon.input.segments.default.carbontable=* is enough, right? and what
> parameter it has?
>
> Regards,
> Jacky
>
> > 在 2017年10月4日,上午12:42,Rahul Kumar <[hidden email]> 写道:
> >
> > <segmentReading.odt>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] support user specified segment reading for query

ravipesala
Hi,

Instead of using SET command to use for segments why don't you use QUERY
HINT . Using query hint we can mention the segments inside the query itself
as a hint.

For example  SELECT /*+SEGMENTS(1,3,5) */ from t1.

By using the above custom hint we can query from selected segments only,
This concept is supported in Spark also and this concept will be helpful in
our any future optimizations

Regards,
Ravindra.

On 5 October 2017 at 12:22, Rahul Kumar <[hidden email]> wrote:

> @Jacky please find the reply of your doubts as follow :
>
>
> *1.    If user uses following command in two different beeline session,
> will there be problem due to multithreading?       SET
> carbon.input.segments.default.*
>
> *carbontable=1,3,5;       select * from carbontable;       SET
> carbon.input.segments.default.**carbontable=*;*
>
> *Ans: *In case of multithreading ,yes there will be problem.
>
>       So threadSet() can be use to set the same property in multithread
> mode.
> *      Folowing syntax can be used to set segment ids for multithread mode*
> :
>    Syntax : CarbonSession.threadSet(“carbon.input.segments.<
> databese_name>.<table_name>”,”<list
> of segment ids>”)
>    e.g =>*future{*
>
> * CarbonSession.threadSet(“**carbon.input.segments.
> default.carbontable”,”1,3,5”)*
>
> * sparkSession.sql(“select * from carbontable”).show*
>
> * CarbonSession.threadSet(“carbon.input.segments.
> default.carbontable”,”*”)*
>
> * }*
>
> *Above will override the property at thread level. So property will be set
> for each thread .*
>
>
> *2.   The RESET command is not clear, why this is needed? It seems SET
> carbon.input.segments.default.**carbontable=* is enough, right? and what
> parameter it has?*
>
> *Ans:* RESET command doesn't take any parameter. RESET is already
> implemented behavior which resets all the properties to their default
> value.So simillarly RESET query will set the above property also to its
> default value.
>
>           Thanks and Regards
>
> *   Rahul Kumar     *
>
>
>
> On Wed, Oct 4, 2017 at 7:21 PM, Jacky Li <[hidden email]> wrote:
>
> > I have 2 doubts:
> > 1. If user uses following command in two different beeline session, will
> > there be problem due to multithreading?
> > SET carbon.input.segments.default.carbontable=1,3,5;
> > select * from carbontable;
> > SET carbon.input.segments.default.carbontable=*;
> >
> >
> > 2. The RESET command is not clear, why this is needed? It seems SET
> > carbon.input.segments.default.carbontable=* is enough, right? and what
> > parameter it has?
> >
> > Regards,
> > Jacky
> >
> > > 在 2017年10月4日,上午12:42,Rahul Kumar <[hidden email]> 写道:
> > >
> > > <segmentReading.odt>
> >
> >
>



--
Thanks & Regards,
Ravi
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] support user specified segment reading for query

rahul_kumar
@Ravindra Thanks. Using query hint may be better approach.

I suggest following syntax :

*select * from t1 [in SEGMENTS(1,3,5)]; *

          Thanks and Regards

*   Rahul Kumar     *



On Thu, Oct 5, 2017 at 1:04 PM, Ravindra Pesala <[hidden email]>
wrote:

> Hi,
>
> Instead of using SET command to use for segments why don't you use QUERY
> HINT . Using query hint we can mention the segments inside the query itself
> as a hint.
>
> For example  SELECT /*+SEGMENTS(1,3,5) */ from t1.
>
> By using the above custom hint we can query from selected segments only,
> This concept is supported in Spark also and this concept will be helpful in
> our any future optimizations
>
> Regards,
> Ravindra.
>
> On 5 October 2017 at 12:22, Rahul Kumar <[hidden email]> wrote:
>
> > @Jacky please find the reply of your doubts as follow :
> >
> >
> > *1.    If user uses following command in two different beeline session,
> > will there be problem due to multithreading?       SET
> > carbon.input.segments.default.*
> >
> > *carbontable=1,3,5;       select * from carbontable;       SET
> > carbon.input.segments.default.**carbontable=*;*
> >
> > *Ans: *In case of multithreading ,yes there will be problem.
> >
> >       So threadSet() can be use to set the same property in multithread
> > mode.
> > *      Folowing syntax can be used to set segment ids for multithread
> mode*
> > :
> >    Syntax : CarbonSession.threadSet(“carbon.input.segments.<
> > databese_name>.<table_name>”,”<list
> > of segment ids>”)
> >    e.g =>*future{*
> >
> > * CarbonSession.threadSet(“**carbon.input.segments.
> > default.carbontable”,”1,3,5”)*
> >
> > * sparkSession.sql(“select * from carbontable”).show*
> >
> > * CarbonSession.threadSet(“carbon.input.segments.
> > default.carbontable”,”*”)*
> >
> > * }*
> >
> > *Above will override the property at thread level. So property will be
> set
> > for each thread .*
> >
> >
> > *2.   The RESET command is not clear, why this is needed? It seems SET
> > carbon.input.segments.default.**carbontable=* is enough, right? and what
> > parameter it has?*
> >
> > *Ans:* RESET command doesn't take any parameter. RESET is already
> > implemented behavior which resets all the properties to their default
> > value.So simillarly RESET query will set the above property also to its
> > default value.
> >
> >           Thanks and Regards
> >
> > *   Rahul Kumar     *
> >
> >
> >
> > On Wed, Oct 4, 2017 at 7:21 PM, Jacky Li <[hidden email]> wrote:
> >
> > > I have 2 doubts:
> > > 1. If user uses following command in two different beeline session,
> will
> > > there be problem due to multithreading?
> > > SET carbon.input.segments.default.carbontable=1,3,5;
> > > select * from carbontable;
> > > SET carbon.input.segments.default.carbontable=*;
> > >
> > >
> > > 2. The RESET command is not clear, why this is needed? It seems SET
> > > carbon.input.segments.default.carbontable=* is enough, right? and what
> > > parameter it has?
> > >
> > > Regards,
> > > Jacky
> > >
> > > > 在 2017年10月4日,上午12:42,Rahul Kumar <[hidden email]> 写道:
> > > >
> > > > <segmentReading.odt>
> > >
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Ravi
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] support user specified segment reading for query

jarray888
same opinion with @Ravindra,use query hint is better approach ,the syntax is easy to use






On 10/05/2017 17:10, Rahul Kumar wrote:
@Ravindra Thanks. Using query hint may be better approach.

I suggest following syntax :

*select * from t1 [in SEGMENTS(1,3,5)]; *

         Thanks and Regards

*   Rahul Kumar     *



On Thu, Oct 5, 2017 at 1:04 PM, Ravindra Pesala <[hidden email]>
wrote:

> Hi,
>
> Instead of using SET command to use for segments why don't you use QUERY
> HINT . Using query hint we can mention the segments inside the query itself
> as a hint.
>
> For example  SELECT /*+SEGMENTS(1,3,5) */ from t1.
>
> By using the above custom hint we can query from selected segments only,
> This concept is supported in Spark also and this concept will be helpful in
> our any future optimizations
>
> Regards,
> Ravindra.
>
> On 5 October 2017 at 12:22, Rahul Kumar <[hidden email]> wrote:
>
> > @Jacky please find the reply of your doubts as follow :
> >
> >
> > *1.    If user uses following command in two different beeline session,
> > will there be problem due to multithreading?       SET
> > carbon.input.segments.default.*
> >
> > *carbontable=1,3,5;       select * from carbontable;       SET
> > carbon.input.segments.default.**carbontable=*;*
> >
> > *Ans: *In case of multithreading ,yes there will be problem.
> >
> >       So threadSet() can be use to set the same property in multithread
> > mode.
> > *      Folowing syntax can be used to set segment ids for multithread
> mode*
> > :
> >    Syntax : CarbonSession.threadSet(“carbon.input.segments.<
> > databese_name>.<table_name>”,”<list
> > of segment ids>”)
> >    e.g =>*future{*
> >
> > * CarbonSession.threadSet(“**carbon.input.segments.
> > default.carbontable”,”1,3,5”)*
> >
> > * sparkSession.sql(“select * from carbontable”).show*
> >
> > * CarbonSession.threadSet(“carbon.input.segments.
> > default.carbontable”,”*”)*
> >
> > * }*
> >
> > *Above will override the property at thread level. So property will be
> set
> > for each thread .*
> >
> >
> > *2.   The RESET command is not clear, why this is needed? It seems SET
> > carbon.input.segments.default.**carbontable=* is enough, right? and what
> > parameter it has?*
> >
> > *Ans:* RESET command doesn't take any parameter. RESET is already
> > implemented behavior which resets all the properties to their default
> > value.So simillarly RESET query will set the above property also to its
> > default value.
> >
> >           Thanks and Regards
> >
> > *   Rahul Kumar     *
> >
> >
> >
> > On Wed, Oct 4, 2017 at 7:21 PM, Jacky Li <[hidden email]> wrote:
> >
> > > I have 2 doubts:
> > > 1. If user uses following command in two different beeline session,
> will
> > > there be problem due to multithreading?
> > > SET carbon.input.segments.default.carbontable=1,3,5;
> > > select * from carbontable;
> > > SET carbon.input.segments.default.carbontable=*;
> > >
> > >
> > > 2. The RESET command is not clear, why this is needed? It seems SET
> > > carbon.input.segments.default.carbontable=* is enough, right? and what
> > > parameter it has?
> > >
> > > Regards,
> > > Jacky
> > >
> > > > 在 2017年10月4日,上午12:42,Rahul Kumar <[hidden email]> 写道:
> > > >
> > > > <segmentReading.odt>
> > >
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Ravi
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] support user specified segment reading for query

sraghunandan
Query hint is a better option.
Some down side in using this approach:
User will have to modify all his query to accommodate this.
This won't be session scope but rather query scope
On Mon, 9 Oct 2017 at 2:36 AM, jarray <[hidden email]> wrote:

> same opinion with @Ravindra,use query hint is better approach ,the syntax
> is easy to use
>
>
>
>
>
>
> On 10/05/2017 17:10, Rahul Kumar wrote:
> @Ravindra Thanks. Using query hint may be better approach.
>
> I suggest following syntax :
>
> *select * from t1 [in SEGMENTS(1,3,5)]; *
>
>          Thanks and Regards
>
> *   Rahul Kumar     *
>
>
>
> On Thu, Oct 5, 2017 at 1:04 PM, Ravindra Pesala <[hidden email]>
> wrote:
>
> > Hi,
> >
> > Instead of using SET command to use for segments why don't you use QUERY
> > HINT . Using query hint we can mention the segments inside the query
> itself
> > as a hint.
> >
> > For example  SELECT /*+SEGMENTS(1,3,5) */ from t1.
> >
> > By using the above custom hint we can query from selected segments only,
> > This concept is supported in Spark also and this concept will be helpful
> in
> > our any future optimizations
> >
> > Regards,
> > Ravindra.
> >
> > On 5 October 2017 at 12:22, Rahul Kumar <[hidden email]> wrote:
> >
> > > @Jacky please find the reply of your doubts as follow :
> > >
> > >
> > > *1.    If user uses following command in two different beeline session,
> > > will there be problem due to multithreading?       SET
> > > carbon.input.segments.default.*
> > >
> > > *carbontable=1,3,5;       select * from carbontable;       SET
> > > carbon.input.segments.default.**carbontable=*;*
> > >
> > > *Ans: *In case of multithreading ,yes there will be problem.
> > >
> > >       So threadSet() can be use to set the same property in multithread
> > > mode.
> > > *      Folowing syntax can be used to set segment ids for multithread
> > mode*
> > > :
> > >    Syntax : CarbonSession.threadSet(“carbon.input.segments.<
> > > databese_name>.<table_name>”,”<list
> > > of segment ids>”)
> > >    e.g =>*future{*
> > >
> > > * CarbonSession.threadSet(“**carbon.input.segments.
> > > default.carbontable”,”1,3,5”)*
> > >
> > > * sparkSession.sql(“select * from carbontable”).show*
> > >
> > > * CarbonSession.threadSet(“carbon.input.segments.
> > > default.carbontable”,”*”)*
> > >
> > > * }*
> > >
> > > *Above will override the property at thread level. So property will be
> > set
> > > for each thread .*
> > >
> > >
> > > *2.   The RESET command is not clear, why this is needed? It seems SET
> > > carbon.input.segments.default.**carbontable=* is enough, right? and
> what
> > > parameter it has?*
> > >
> > > *Ans:* RESET command doesn't take any parameter. RESET is already
> > > implemented behavior which resets all the properties to their default
> > > value.So simillarly RESET query will set the above property also to its
> > > default value.
> > >
> > >           Thanks and Regards
> > >
> > > *   Rahul Kumar     *
> > >
> > >
> > >
> > > On Wed, Oct 4, 2017 at 7:21 PM, Jacky Li <[hidden email]> wrote:
> > >
> > > > I have 2 doubts:
> > > > 1. If user uses following command in two different beeline session,
> > will
> > > > there be problem due to multithreading?
> > > > SET carbon.input.segments.default.carbontable=1,3,5;
> > > > select * from carbontable;
> > > > SET carbon.input.segments.default.carbontable=*;
> > > >
> > > >
> > > > 2. The RESET command is not clear, why this is needed? It seems SET
> > > > carbon.input.segments.default.carbontable=* is enough, right? and
> what
> > > > parameter it has?
> > > >
> > > > Regards,
> > > > Jacky
> > > >
> > > > > 在 2017年10月4日,上午12:42,Rahul Kumar <[hidden email]> 写道:
> > > > >
> > > > > <segmentReading.odt>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Ravi
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] support user specified segment reading for query

lionel061201
Hi Rahul,
I agree query hint is better. The syntax from Ravi is the same with Oracle,
which is more popular and acceptable I think...
Show Segments statement could remove 'For table', just 'show segments
[dbname].<tableName>'. It's simple and same style with show partition
statement...

Thanks,
Lionel Cao

On Mon, Oct 9, 2017 at 8:54 AM, Raghunandan S <
[hidden email]> wrote:

> Query hint is a better option.
> Some down side in using this approach:
> User will have to modify all his query to accommodate this.
> This won't be session scope but rather query scope
> On Mon, 9 Oct 2017 at 2:36 AM, jarray <[hidden email]> wrote:
>
> > same opinion with @Ravindra,use query hint is better approach ,the syntax
> > is easy to use
> >
> >
> >
> >
> >
> >
> > On 10/05/2017 17:10, Rahul Kumar wrote:
> > @Ravindra Thanks. Using query hint may be better approach.
> >
> > I suggest following syntax :
> >
> > *select * from t1 [in SEGMENTS(1,3,5)]; *
> >
> >          Thanks and Regards
> >
> > *   Rahul Kumar     *
> >
> >
> >
> > On Thu, Oct 5, 2017 at 1:04 PM, Ravindra Pesala <[hidden email]>
> > wrote:
> >
> > > Hi,
> > >
> > > Instead of using SET command to use for segments why don't you use
> QUERY
> > > HINT . Using query hint we can mention the segments inside the query
> > itself
> > > as a hint.
> > >
> > > For example  SELECT /*+SEGMENTS(1,3,5) */ from t1.
> > >
> > > By using the above custom hint we can query from selected segments
> only,
> > > This concept is supported in Spark also and this concept will be
> helpful
> > in
> > > our any future optimizations
> > >
> > > Regards,
> > > Ravindra.
> > >
> > > On 5 October 2017 at 12:22, Rahul Kumar <[hidden email]>
> wrote:
> > >
> > > > @Jacky please find the reply of your doubts as follow :
> > > >
> > > >
> > > > *1.    If user uses following command in two different beeline
> session,
> > > > will there be problem due to multithreading?       SET
> > > > carbon.input.segments.default.*
> > > >
> > > > *carbontable=1,3,5;       select * from carbontable;       SET
> > > > carbon.input.segments.default.**carbontable=*;*
> > > >
> > > > *Ans: *In case of multithreading ,yes there will be problem.
> > > >
> > > >       So threadSet() can be use to set the same property in
> multithread
> > > > mode.
> > > > *      Folowing syntax can be used to set segment ids for multithread
> > > mode*
> > > > :
> > > >    Syntax : CarbonSession.threadSet(“carbon.input.segments.<
> > > > databese_name>.<table_name>”,”<list
> > > > of segment ids>”)
> > > >    e.g =>*future{*
> > > >
> > > > * CarbonSession.threadSet(“**carbon.input.segments.
> > > > default.carbontable”,”1,3,5”)*
> > > >
> > > > * sparkSession.sql(“select * from carbontable”).show*
> > > >
> > > > * CarbonSession.threadSet(“carbon.input.segments.
> > > > default.carbontable”,”*”)*
> > > >
> > > > * }*
> > > >
> > > > *Above will override the property at thread level. So property will
> be
> > > set
> > > > for each thread .*
> > > >
> > > >
> > > > *2.   The RESET command is not clear, why this is needed? It seems
> SET
> > > > carbon.input.segments.default.**carbontable=* is enough, right? and
> > what
> > > > parameter it has?*
> > > >
> > > > *Ans:* RESET command doesn't take any parameter. RESET is already
> > > > implemented behavior which resets all the properties to their default
> > > > value.So simillarly RESET query will set the above property also to
> its
> > > > default value.
> > > >
> > > >           Thanks and Regards
> > > >
> > > > *   Rahul Kumar     *
> > > >
> > > >
> > > >
> > > > On Wed, Oct 4, 2017 at 7:21 PM, Jacky Li <[hidden email]> wrote:
> > > >
> > > > > I have 2 doubts:
> > > > > 1. If user uses following command in two different beeline session,
> > > will
> > > > > there be problem due to multithreading?
> > > > > SET carbon.input.segments.default.carbontable=1,3,5;
> > > > > select * from carbontable;
> > > > > SET carbon.input.segments.default.carbontable=*;
> > > > >
> > > > >
> > > > > 2. The RESET command is not clear, why this is needed? It seems SET
> > > > > carbon.input.segments.default.carbontable=* is enough, right? and
> > what
> > > > > parameter it has?
> > > > >
> > > > > Regards,
> > > > > Jacky
> > > > >
> > > > > > 在 2017年10月4日,上午12:42,Rahul Kumar <[hidden email]> 写道:
> > > > > >
> > > > > > <segmentReading.odt>
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Ravi
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] support user specified segment reading for query

rahul_kumar
 Thanks @Raghunandan for updating the use case.

As i have analyzed UnresolvedHint is supported with Spark-2.2. So we can
support Query-Hint once spark2.2 integration is completed. Jira(
CARBONDATA-1546 <https://issues.apache.org/jira/browse/CARBONDATA-1546>) is
already raised for QueryHint.

*@dev* What if we keep both SET query and Query HINT implementation for
segment reading?


For now as we have some use cases for session level and thread level. We
can implement SET query as part of CARBONDATA-1398
<https://issues.apache.org/jira/browse/CARBONDATA-1398>.


@Lione *'show segments' *is already implemented . It may cause
incompatiblity to existing user of carbondata if we change.I suggest we can
make '*For Table*' optional part in separate jira.


          Thanks and Regards

*   Rahul Kumar     *



On Mon, Oct 9, 2017 at 8:21 AM, Lu Cao <[hidden email]> wrote:

> Hi Rahul,
> I agree query hint is better. The syntax from Ravi is the same with Oracle,
> which is more popular and acceptable I think...
> Show Segments statement could remove 'For table', just 'show segments
> [dbname].<tableName>'. It's simple and same style with show partition
> statement...
>
> Thanks,
> Lionel Cao
>
> On Mon, Oct 9, 2017 at 8:54 AM, Raghunandan S <
> [hidden email]> wrote:
>
> > Query hint is a better option.
> > Some down side in using this approach:
> > User will have to modify all his query to accommodate this.
> > This won't be session scope but rather query scope
> > On Mon, 9 Oct 2017 at 2:36 AM, jarray <[hidden email]> wrote:
> >
> > > same opinion with @Ravindra,use query hint is better approach ,the
> syntax
> > > is easy to use
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 10/05/2017 17:10, Rahul Kumar wrote:
> > > @Ravindra Thanks. Using query hint may be better approach.
> > >
> > > I suggest following syntax :
> > >
> > > *select * from t1 [in SEGMENTS(1,3,5)]; *
> > >
> > >          Thanks and Regards
> > >
> > > *   Rahul Kumar     *
> > >
> > >
> > >
> > > On Thu, Oct 5, 2017 at 1:04 PM, Ravindra Pesala <[hidden email]
> >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Instead of using SET command to use for segments why don't you use
> > QUERY
> > > > HINT . Using query hint we can mention the segments inside the query
> > > itself
> > > > as a hint.
> > > >
> > > > For example  SELECT /*+SEGMENTS(1,3,5) */ from t1.
> > > >
> > > > By using the above custom hint we can query from selected segments
> > only,
> > > > This concept is supported in Spark also and this concept will be
> > helpful
> > > in
> > > > our any future optimizations
> > > >
> > > > Regards,
> > > > Ravindra.
> > > >
> > > > On 5 October 2017 at 12:22, Rahul Kumar <[hidden email]>
> > wrote:
> > > >
> > > > > @Jacky please find the reply of your doubts as follow :
> > > > >
> > > > >
> > > > > *1.    If user uses following command in two different beeline
> > session,
> > > > > will there be problem due to multithreading?       SET
> > > > > carbon.input.segments.default.*
> > > > >
> > > > > *carbontable=1,3,5;       select * from carbontable;       SET
> > > > > carbon.input.segments.default.**carbontable=*;*
> > > > >
> > > > > *Ans: *In case of multithreading ,yes there will be problem.
> > > > >
> > > > >       So threadSet() can be use to set the same property in
> > multithread
> > > > > mode.
> > > > > *      Folowing syntax can be used to set segment ids for
> multithread
> > > > mode*
> > > > > :
> > > > >    Syntax : CarbonSession.threadSet(“carbon.input.segments.<
> > > > > databese_name>.<table_name>”,”<list
> > > > > of segment ids>”)
> > > > >    e.g =>*future{*
> > > > >
> > > > > * CarbonSession.threadSet(“**carbon.input.segments.
> > > > > default.carbontable”,”1,3,5”)*
> > > > >
> > > > > * sparkSession.sql(“select * from carbontable”).show*
> > > > >
> > > > > * CarbonSession.threadSet(“carbon.input.segments.
> > > > > default.carbontable”,”*”)*
> > > > >
> > > > > * }*
> > > > >
> > > > > *Above will override the property at thread level. So property will
> > be
> > > > set
> > > > > for each thread .*
> > > > >
> > > > >
> > > > > *2.   The RESET command is not clear, why this is needed? It seems
> > SET
> > > > > carbon.input.segments.default.**carbontable=* is enough, right?
> and
> > > what
> > > > > parameter it has?*
> > > > >
> > > > > *Ans:* RESET command doesn't take any parameter. RESET is already
> > > > > implemented behavior which resets all the properties to their
> default
> > > > > value.So simillarly RESET query will set the above property also to
> > its
> > > > > default value.
> > > > >
> > > > >           Thanks and Regards
> > > > >
> > > > > *   Rahul Kumar     *
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Oct 4, 2017 at 7:21 PM, Jacky Li <[hidden email]>
> wrote:
> > > > >
> > > > > > I have 2 doubts:
> > > > > > 1. If user uses following command in two different beeline
> session,
> > > > will
> > > > > > there be problem due to multithreading?
> > > > > > SET carbon.input.segments.default.carbontable=1,3,5;
> > > > > > select * from carbontable;
> > > > > > SET carbon.input.segments.default.carbontable=*;
> > > > > >
> > > > > >
> > > > > > 2. The RESET command is not clear, why this is needed? It seems
> SET
> > > > > > carbon.input.segments.default.carbontable=* is enough, right?
> and
> > > what
> > > > > > parameter it has?
> > > > > >
> > > > > > Regards,
> > > > > > Jacky
> > > > > >
> > > > > > > 在 2017年10月4日,上午12:42,Rahul Kumar <[hidden email]> 写道:
> > > > > > >
> > > > > > > <segmentReading.odt>
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Regards,
> > > > Ravi
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] support user specified segment reading for query

Liang Chen
Administrator
Hi Rahul

I suggest only doing "Query HINT".

Please finalize the query script :  
select * from t1 [in SEGMENTS(1,3,5)]  or SELECT /*+SEGMENTS(1,3,5) */ from
t1

Regards
Liang





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] support user specified segment reading for query

sraghunandan
@Liang query hint is supported only from spark 2.2.x
We need to support spark 2.1.x
Hence currently we need to go ahead with set command
On Wed, 11 Oct 2017 at 12:49 PM, Liang Chen <[hidden email]> wrote:

> Hi Rahul
>
> I suggest only doing "Query HINT".
>
> Please finalize the query script :
> select * from t1 [in SEGMENTS(1,3,5)]  or SELECT /*+SEGMENTS(1,3,5) */ from
> t1
>
> Regards
> Liang
>
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] support user specified segment reading for query

Jin Zhou
In reply to this post by Liang Chen
@Liang

query hint seems like a nice way to support this feature.
but if query hint is only supported in spark 2.2, then the "set command"
design may be a alternative way in spark 2.1 because there are some hadoop
distributions (such as huawei FI) whose spark version is 2.1.





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/