[DISCUSSION] Regarding to redundancy code and some issues.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSSION] Regarding to redundancy code and some issues.

David CaiQiang
Hi All,
   Here, I listed the following points to improve the code.

Redundancy:
1. CarbonLoadModel.isDirectLoad
It is always true, better to remove the related code.
Now CarbonData doesn't pre-partition the input data by machine node again,
so it is not required.

2. isTableSplitPartition
in CarbonDataRDDFactory and NewCarbonDataLoadRDD, it is always false, better
to remove the related code also.

Re-factory:
1. CarbonDataRDDFactory.loadCarbonData
This method is not readable, it is too large to support load data from the
input file or select query, support load or insert or update, support
partition and so on. Better to decouple the code by function.

2. Unit Test Case
There are about 400 CSV files in Unit Test Case.
Suggesting to unify the input scenario to reduce the CSV file and improve
the coverage of UT.

Issue:
1. During the data loading, sort_columns should support all datatype

2.During the query, the end key uses one byte "0xFF" by default, it is not
correct.

   Any question? Any suggesting?




-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Regarding to redundancy code and some issues.

Liang Chen
Administrator
+1, all are good proposals.

Regards
Liang


David CaiQiang wrote

> Hi All,
>    Here, I listed the following points to improve the code.
>
> Redundancy:
> 1. CarbonLoadModel.isDirectLoad
> It is always true, better to remove the related code.
> Now CarbonData doesn't pre-partition the input data by machine node again,
> so it is not required.
>
> 2. isTableSplitPartition
> in CarbonDataRDDFactory and NewCarbonDataLoadRDD, it is always false,
> better
> to remove the related code also.
>
> Re-factory:
> 1. CarbonDataRDDFactory.loadCarbonData
> This method is not readable, it is too large to support load data from the
> input file or select query, support load or insert or update, support
> partition and so on. Better to decouple the code by function.
>
> 2. Unit Test Case
> There are about 400 CSV files in Unit Test Case.
> Suggesting to unify the input scenario to reduce the CSV file and improve
> the coverage of UT.
>
> Issue:
> 1. During the data loading, sort_columns should support all datatype
>
> 2.During the query, the end key uses one byte "0xFF" by default, it is not
> correct.
>
>    Any question? Any suggesting?
>
>
>
>
> -----
> Best Regards
> David Cai
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/