Hi Community,
Recently we developed a code to reduce the table status file size. So now the table status file contains the short forms of name which reduces the files size. For compatibility case, we have added a @Serialized annotation provided by gson(version 2.4) which allows to mention the alternate names which will be mapped to old names in table status file.
we depend on hadoop-common project and it in turn depends on gson version (2.2.4). 2.2.4 version doesn't provide the feature to get the alternate names as mentioned above. Even Spark uses hadoop-common, so we will get the gson-2.2.4 jar with spark jars package, So in cluster if 2.2.4 gets loaded first, we will get load metadata details as null after reading the table status file for all the old stores and we can't do any query and some operations.
Currently i have raised a PR which excludes gson2.2.4 to get downloaded from hadoop-common when we build carbondata.
https://github.com/apache/carbondata/pull/3501But how to make sure spark and hadoop packages does not contains these jars. currently i have removed 2.2.4 jars from cluster to make work.
Any suggestions are welcomed.
Regards,
Akash