[jira] [Commented] (CARBONDATA-3287) Remove the validation of same chema data files in location for external table and file format

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (CARBONDATA-3287) Remove the validation of same chema data files in location for external table and file format

Akash R Nilugal (Jira)

    [ https://issues.apache.org/jira/browse/CARBONDATA-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16759739#comment-16759739 ]

Akash R Nilugal commented on CARBONDATA-3287:
---------------------------------------------

discussion can be done at [mailto:http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-read-latest-schema-in-case-of-external-table-and-file-format-tt74986.html]

> Remove the validation of  same chema data files in location for external table and file format
> ----------------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-3287
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3287
>             Project: CarbonData
>          Issue Type: Bug
>            Reporter: Akash R Nilugal
>            Assignee: Akash R Nilugal
>            Priority: Major
>
> Currently we have a validation that if there are two carbondata files in a location with different schema, then we fail the query. I think there is no need to fail. If you see the parquet behavior also we cna understand. 
>  
> Here i think failing is not good, we can read the latets schema from latest carbondata file in the given location and based on that read all the files and give query output. For the columns which are not present in some data files, it wil have null values for the new column.
>  
> But here basically we do not merge schema. we can maintain the same now also, only thing is can take latest schma.
>  
> for example:
> 1. one data file with columns a,b and c. 2nd file is with columns a,b,c,d,e. then can read and create table with 5 columns or 3 columns which ever is latest and create table(This will be when user does not specify schema). If he species table will be created with specified schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)