Hi All,
Here, I listed the following points to improve the code.
Redundancy:
1. CarbonLoadModel.isDirectLoad
It is always true, better to remove the related code.
Now CarbonData doesn't pre-partition the input data by machine node again,
so it is not required.
2. isTableSplitPartition
in CarbonDataRDDFactory and NewCarbonDataLoadRDD, it is always false, better
to remove the related code also.
Re-factory:
1. CarbonDataRDDFactory.loadCarbonData
This method is not readable, it is too large to support load data from the
input file or select query, support load or insert or update, support
partition and so on. Better to decouple the code by function.
2. Unit Test Case
There are about 400 CSV files in Unit Test Case.
Suggesting to unify the input scenario to reduce the CSV file and improve
the coverage of UT.
Issue:
1. During the data loading, sort_columns should support all datatype
2.During the query, the end key uses one byte "0xFF" by default, it is not
correct.
Any question? Any suggesting?
-----
Best Regards
David Cai
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai