Apache CarbonData Dev Mailing List archive

[DISCUSSION] Regarding to redundancy code and some issues.

Classic

List

Threaded

2 messages Options

David CaiQiang

Nov 04, 2017; 11:38am

[DISCUSSION] Regarding to redundancy code and some issues.

171 posts

Hi All,
Here, I listed the following points to improve the code.

Redundancy:
1. CarbonLoadModel.isDirectLoad
It is always true, better to remove the related code.
Now CarbonData doesn't pre-partition the input data by machine node again,
so it is not required.

2. isTableSplitPartition
in CarbonDataRDDFactory and NewCarbonDataLoadRDD, it is always false, better
to remove the related code also.

Re-factory:
1. CarbonDataRDDFactory.loadCarbonData
This method is not readable, it is too large to support load data from the
input file or select query, support load or insert or update, support
partition and so on. Better to decouple the code by function.

2. Unit Test Case
There are about 400 CSV files in Unit Test Case.
Suggesting to unify the input scenario to reduce the CSV file and improve
the coverage of UT.

Issue:
1. During the data loading, sort_columns should support all datatype

2.During the query, the end key uses one byte "0xFF" by default, it is not
correct.

Any question? Any suggesting?

-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Best Regards
David Cai

Liang Chen

Nov 04, 2017; 2:32pm

Re: [DISCUSSION] Regarding to redundancy code and some issues.

Administrator

313 posts

+1, all are good proposals.

Regards
Liang

David CaiQiang wrote

> Hi All,
> Here, I listed the following points to improve the code.
>
> Redundancy:
> 1. CarbonLoadModel.isDirectLoad
> It is always true, better to remove the related code.
> Now CarbonData doesn't pre-partition the input data by machine node again,
> so it is not required.
>
> 2. isTableSplitPartition
> in CarbonDataRDDFactory and NewCarbonDataLoadRDD, it is always false,
> better
> to remove the related code also.
>
> Re-factory:
> 1. CarbonDataRDDFactory.loadCarbonData
> This method is not readable, it is too large to support load data from the
> input file or select query, support load or insert or update, support
> partition and so on. Better to decouple the code by function.
>
> 2. Unit Test Case
> There are about 400 CSV files in Unit Test Case.
> Suggesting to unify the input scenario to reduce the CSV file and improve
> the coverage of UT.
>
> Issue:
> 1. During the data loading, sort_columns should support all datatype
>
> 2.During the query, the end key uses one byte "0xFF" by default, it is not
> correct.
>
> Any question? Any suggesting?
>
>
>
>
> -----
> Best Regards
> David Cai
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

... [show rest of quote]

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/