Apache CarbonData Dev Mailing List archive - [Discussion]read latest schema in case of external table and file format

Apache CarbonData Dev Mailing List archive

[Discussion]read latest schema in case of external table and file format

Posted by akashrn5 on Feb 04, 2019; 10:04am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-read-latest-schema-in-case-of-external-table-and-file-format-tp74986.html

Hi dev,

Currently we have a validation that if there are two carbondata files in a
location with different schema, then we fail the query. I think there is no
need to fail. If you see the parquet behavior also we cna understand.

Here i think failing is not good, we can read the latets schema from latest
carbondata file in the given location and based on that read all the files
and give query output. For the columns which are not present in some data
files, it wil have null values for the new column.

But here basically we do not merge schema. we can maintain the same now
also, only thing is can take latest schma.

for example:
1. one data file with columns a,b and c. 2nd file is with columns
a,b,c,d,e. then can read and create table with 5 columns or 3 columns which
ever is latest and create table(This will be when user does not specify
schema). If he species table will be created with specified schema.

I have created a jira for this
https://issues.apache.org/jira/browse/CARBONDATA-3287
If any input, please let me know.

Regards,
Akash