CarbonWriterBuild issue

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

CarbonWriterBuild issue

Jacky Li
Hi Dev,

I observed that we have added many buildXXXWriter in CarbonWriterBuilder in SDK, but all of them is just different parameter combination. I think it is better to add those parameter by withXXX function, which complies to Builder pattern. Otherwise it is hard to maintain if we keep adding these parameter and new build functions.
Any suggestion on this?

public CarbonWriter buildWriterForCSVInput(Schema schema, Configuration configuration)

public CarbonWriter buildThreadSafeWriterForCSVInput(Schema schema, short numOfThreads, Configuration configuration)

public CarbonWriter buildWriterForAvroInput(org.apache.avro.Schema avroSchema, Configuration configuration)

public CarbonWriter buildThreadSafeWriterForAvroInput(org.apache.avro.Schema avroSchema, short numOfThreads, Configuration configuration)

public JsonCarbonWriter buildWriterForJsonInput(Schema carbonSchema, Configuration configuration)

public JsonCarbonWriter buildThreadSafeWriterForJsonInput(Schema carbonSchema, short numOfThreads, Configuration configuration)


Regards,
Jacky

Reply | Threaded
Open this post in threaded view
|

Re: CarbonWriterBuild issue

xuchuanyin
Yeah, it actually belongs to 'Builder Pattern'. We should simplify this
before they are widely used.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: CarbonWriterBuild issue

ravipesala
Yes, should not have this many methods, we will remove those methods and
add those to builder.

Regards,
Ravindra.
On Thu, 13 Sep 2018 at 9:14 AM, xuchuanyin <[hidden email]> wrote:

> Yeah, it actually belongs to 'Builder Pattern'. We should simplify this
> before they are widely used.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: CarbonWriterBuild issue

Ajantha Bhat
yes,

finally, we can keep 3 API with below signatures.



*buildWriterForCSVInput(carbonSchema)buildWriterForAvroInput(avroSchema)buildWriterForJsonInput(carbonSchema)*

and create new methods for

*a) withThreadSafe(NumOfThreads) b) withHadoopConf(configuration)*

Along with these change, I want to simplify some more API.

1) Below 5 table properties methods can be made private.
so that end user uses only withTableProperties API instead of each separate
API.






*public CarbonWriterBuilder sortBy(String[] sortColumns)public
CarbonWriterBuilder withBlockSize(int blockSize)public CarbonWriterBuilder
localDictionaryThreshold(int localDictionaryThreshold)public
CarbonWriterBuilder enableLocalDictionary(boolean
enableLocalDictionary)public CarbonWriterBuilder withBlockletSize(int
blockletSize)*
this is the exposed API for the above configurations.
*public CarbonWriterBuilder withTableProperties(Map<String, String>
options)*

2) Also at SDK side, name transactional and nontransactional doesn't make
sense.
As long as we don't write the metadata folder, it is always
nonTransactional table at write side.

*public CarbonWriterBuilder isTransactionalTable(boolean
isTransactionalTable)*

can we rename above API to isFlatFolderPath ?
Also, at SDK reader side also we can rename this.
and by default keep flat folder as true.

let me know if any suggestions.

So, that I can go ahead with the changes

Thanks,
AB

On Thu, Sep 13, 2018 at 11:15 AM Ravindra Pesala <[hidden email]>
wrote:

> Yes, should not have this many methods, we will remove those methods and
> add those to builder.
>
> Regards,
> Ravindra.
> On Thu, 13 Sep 2018 at 9:14 AM, xuchuanyin <[hidden email]> wrote:
>
> > Yeah, it actually belongs to 'Builder Pattern'. We should simplify this
> > before they are widely used.
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: CarbonWriterBuild issue

xuchuanyin
In reply to this post by ravipesala
can you provide the full signature of these 3 API?

I’d like to call the method like this :

WriterBuilder
  .csv(CarbonSchma)
  .withXXX(…)
  .build

WriterBuilder
  .avro(AvroSchma)
  .withXXX(…)
  .build

WriterBuilder
  .json(CarbonSchma)
  .withXXX(…)
  .build
Reply | Threaded
Open this post in threaded view
|

Re: CarbonWriterBuild issue

kanaka
+1 for the proposal to clear SDK APIs.
Thanks Ajantha for initiating the code changes.

For schema input for  writer creation, I also feel we should unify to all
writer creation methods to Builder. API looks cleaner if we provide just
build() without out any more arguments.


"withTableProperties(Map<String, String> options) vs
sortBy(..),withBlockSize(...),etc"
----- I think both of these methods can serve for different purposes.
withTableProperties(Map<String, String> options) can be used by customer
apps which takes property input directly by end users who is familiar with
carbon create table syntax.
Individual methods can be used by customers app code to avoid problems like
wrong spells or wrong names.

"public CarbonWriterBuilder isTransactionalTable(boolean
isTransactionalTable)"
-- I think we can remove if we are not clear on the usecase at this moment
and to avoid confusions






--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
-
Kanaka
Reply | Threaded
Open this post in threaded view
|

Re: CarbonWriterBuild issue

Ajantha Bhat
@xuchuanyin:

yes, method signatures will be like you specified.

@kanaka: I still think we should keep only table properties Map as we
validate "wrong_spells and names". More options will create more confusion.
So, just keeping table properties Map can simplify configurations. End user
can form a map and pass. Just like existing withLoadOptions map

Any other suggestions are welcome

Thanks,
AB
.

On Thu, Sep 20, 2018 at 10:55 PM kanaka <[hidden email]> wrote:

> +1 for the proposal to clear SDK APIs.
> Thanks Ajantha for initiating the code changes.
>
> For schema input for  writer creation, I also feel we should unify to all
> writer creation methods to Builder. API looks cleaner if we provide just
> build() without out any more arguments.
>
>
> "withTableProperties(Map<String, String> options) vs
> sortBy(..),withBlockSize(...),etc"
> ----- I think both of these methods can serve for different purposes.
> withTableProperties(Map<String, String> options) can be used by customer
> apps which takes property input directly by end users who is familiar with
> carbon create table syntax.
> Individual methods can be used by customers app code to avoid problems like
> wrong spells or wrong names.
>
> "public CarbonWriterBuilder isTransactionalTable(boolean
> isTransactionalTable)"
> -- I think we can remove if we are not clear on the usecase at this moment
> and to avoid confusions
>
>
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: CarbonWriterBuild issue

Ajantha Bhat
Also now we that we support Hadoop conf,

we don't require below API. we can remove them from CarbonWriterBuilder.







*setAccessKeysetAccessKeysetSecretKeysetSecretKeysetEndPointsetEndPoint*
Thanks,
AB


On Thu, Sep 20, 2018 at 11:16 PM Ajantha Bhat <[hidden email]> wrote:

>
> @xuchuanyin:
>
> yes, method signatures will be like you specified.
>
> @kanaka: I still think we should keep only table properties Map as we
> validate "wrong_spells and names". More options will create more confusion.
> So, just keeping table properties Map can simplify configurations. End
> user can form a map and pass. Just like existing withLoadOptions map
>
> Any other suggestions are welcome
>
> Thanks,
> AB
> .
>
> On Thu, Sep 20, 2018 at 10:55 PM kanaka <[hidden email]>
> wrote:
>
>> +1 for the proposal to clear SDK APIs.
>> Thanks Ajantha for initiating the code changes.
>>
>> For schema input for  writer creation, I also feel we should unify to all
>> writer creation methods to Builder. API looks cleaner if we provide just
>> build() without out any more arguments.
>>
>>
>> "withTableProperties(Map<String, String> options) vs
>> sortBy(..),withBlockSize(...),etc"
>> ----- I think both of these methods can serve for different purposes.
>> withTableProperties(Map<String, String> options) can be used by customer
>> apps which takes property input directly by end users who is familiar with
>> carbon create table syntax.
>> Individual methods can be used by customers app code to avoid problems
>> like
>> wrong spells or wrong names.
>>
>> "public CarbonWriterBuilder isTransactionalTable(boolean
>> isTransactionalTable)"
>> -- I think we can remove if we are not clear on the usecase at this moment
>> and to avoid confusions
>>
>>
>>
>>
>>
>>
>> --
>> Sent from:
>> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: CarbonWriterBuild issue

ravipesala
 +1

I feel it is better to remove the transactional flag from sdk  API as it is
redundant currently. we better support it in a better way in future.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/