GitHub user ajantha-bhat opened a pull request:
https://github.com/apache/carbondata/pull/2273 [CARBONDATA-2442] Reading two sdk writer output with differnt schema should prompt exception [CARBONDATA-2442] Reading two sdk writer output with differnt schema should prompt exception **problem** : when two sdk writer output with differnt schema is placed in same folder for reading, output is not as expected. It has many null output. **root cause:** when multiple carbondata and indexx files is placed in same folder. table schema is inferred by first file. comparing table schema with all other index file schema validation is not present **solution:** comapre table schema with all other index file schema, if there is a mismatch throw exception Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done added the test case in TestNonTransactionalCarbonTable.scala - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata schema_mismatch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2273.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2273 ---- commit 03ab5f30909fe971e6221fdab83bdeb2f81e385c Author: ajantha-bhat <ajanthabhat@...> Date: 2018-05-05T11:29:44Z [CARBONDATA-2442] Reading two sdk writer output with differnt schema should prompt exception problem : when two sdk writer output with differnt schema is placed in same folder for reading, output is not as expected. It has many null output. root cause: when multiple carbondata and indexx files is placed in same folder. table schema is inferred by first file. comparing table schema with all other index file schema validation is not present solution: comapre table schema with all other index file schema, if there is a mismatch throw exception ---- --- |
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2273 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4736/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2273 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4499/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2273 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5659/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2273 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2273 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5665/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2273 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4505/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2273 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2273 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4518/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2273 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5678/ --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2273#discussion_r186696437 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/column/ColumnSchema.java --- @@ -342,6 +342,30 @@ public void setParentColumnTableRelations( return true; } + /** + * method to compare columnSchema, + * other parameters along with just column name and column data type + * @param obj + * @return + */ + public boolean equalsWithStrictCheck(Object obj) { + if (!this.equals(obj)) { + return false; + } + ColumnSchema other = (ColumnSchema) obj; + if (!columnUniqueId.equals(other.columnUniqueId) || + (isDimensionColumn != other.isDimensionColumn) || + (scale != other.scale) || + (precision != other.precision) || + (isSortColumn != other.isSortColumn)) { + return false; + } + if (encodingList.size() != other.encodingList.size()) { --- End diff -- Better to check the encoding values also...this is a generic method and can be useful for other schenario --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2273#discussion_r186725643 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/column/ColumnSchema.java --- @@ -342,6 +342,30 @@ public void setParentColumnTableRelations( return true; } + /** + * method to compare columnSchema, + * other parameters along with just column name and column data type + * @param obj + * @return + */ + public boolean equalsWithStrictCheck(Object obj) { + if (!this.equals(obj)) { + return false; + } + ColumnSchema other = (ColumnSchema) obj; + if (!columnUniqueId.equals(other.columnUniqueId) || + (isDimensionColumn != other.isDimensionColumn) || + (scale != other.scale) || + (precision != other.precision) || + (isSortColumn != other.isSortColumn)) { + return false; + } + if (encodingList.size() != other.encodingList.size()) { --- End diff -- done. added --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2273 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2273 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5741/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2273 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5745/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2273 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4805/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2273 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4585/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2273 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4808/ --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2273#discussion_r186992068 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -151,6 +154,33 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO SegmentStatusManager segmentStatusManager = new SegmentStatusManager(identifier); SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = segmentStatusManager .getValidAndInvalidSegments(loadMetadataDetails, this.readCommittedScope); + + // For NonTransactional table, compare the schema of all index files with inferred schema. + // If there is a mismatch throw exception. As all files must be of same schema. + if (!carbonTable.getTableInfo().isTransactionalTable()) { + SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl(); + for (Segment segment : segments.getValidSegments()) { + Map<String, String> indexFiles = segment.getCommittedIndexFile(); + for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) { + Path indexFile = new Path(indexFileEntry.getKey()); + org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile( + indexFile.toString(), carbonTable.getTableName()); + TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo( + tableInfo, identifier.getDatabaseName(), + identifier.getTableName(), + identifier.getTablePath()); + List<ColumnSchema> indexFileColumnList = + wrapperTableInfo.getFactTable().getListOfColumns(); + List<ColumnSchema> tableColumnList = + carbonTable.getTableInfo().getFactTable().getListOfColumns(); + if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) { + throw new IOException("All the files schema doesn't match. " --- End diff -- I dont think throwing exception is the correct approach. Either we should not let the user write data with different schema or we should infer the schema from the first index file that was written and skip any index file with different schema while preparing splits. --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2273#discussion_r186992244 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -151,6 +154,33 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO SegmentStatusManager segmentStatusManager = new SegmentStatusManager(identifier); SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = segmentStatusManager .getValidAndInvalidSegments(loadMetadataDetails, this.readCommittedScope); + + // For NonTransactional table, compare the schema of all index files with inferred schema. + // If there is a mismatch throw exception. As all files must be of same schema. + if (!carbonTable.getTableInfo().isTransactionalTable()) { + SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl(); + for (Segment segment : segments.getValidSegments()) { + Map<String, String> indexFiles = segment.getCommittedIndexFile(); + for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) { + Path indexFile = new Path(indexFileEntry.getKey()); + org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile( + indexFile.toString(), carbonTable.getTableName()); + TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo( + tableInfo, identifier.getDatabaseName(), + identifier.getTableName(), + identifier.getTablePath()); + List<ColumnSchema> indexFileColumnList = + wrapperTableInfo.getFactTable().getListOfColumns(); + List<ColumnSchema> tableColumnList = + carbonTable.getTableInfo().getFactTable().getListOfColumns(); + if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) { + throw new IOException("All the files schema doesn't match. " --- End diff -- I dont think throwing exception is the correct approach. Either we should not let the user write data with different schema or we should infer the schema from the first index file that was written and skip any index file with different schema while preparing splits. @sounakr @kumarvishal09 what do you think?? --- |
Free forum by Nabble | Edit this page |