GitHub user ajantha-bhat opened a pull request:
https://github.com/apache/carbondata/pull/2299 [CARBONDATA-2472] Fixed:NonTransactional table performance degarde issue induced by PR #2273 Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata unmanaged_table Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2299.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2299 ---- commit 0846033a4ea91bf6e97a8a32dcbf853c2d98fdb0 Author: ajantha-bhat <ajanthabhat@...> Date: 2018-05-11T08:58:46Z [CARBONDATA-2472] Fixed:NonTransactional table performance degarde issue induced by PR #2273 ---- --- |
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2299 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2299 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5835/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2299 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4679/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2299 retest this please --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2299 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4883/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2299 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4682/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2299 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5838/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2299 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4685/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2299 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5841/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2299 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4888/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2299#discussion_r187762940 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java --- @@ -297,7 +297,7 @@ public TableDataMap registerDataMap(CarbonTable table, dataMapSchema, dataMapFactory, blockletDetailsFetcher, segmentPropertiesFetcher); tableIndices.add(dataMap); - allDataMaps.put(tableUniqueName, tableIndices); + allDataMaps.put(tableUniqueName.toLowerCase(), tableIndices); --- End diff -- Is this a bug? It seems not related to this PR --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2299#discussion_r187768276 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java --- @@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) { Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers = segmentMap.get(segment.getSegmentNo()); if (tableBlockIndexUniqueIdentifiers == null) { + CarbonTable carbonTable = this.getCarbonTable(); + if (!carbonTable.getTableInfo().isTransactionalTable()) { + // For NonTransactional table, compare the schema of all index files with inferred schema. + // If there is a mismatch throw exception. As all files must be of same schema. + validateSchemaForNewTranscationalTableFiles(segment, carbonTable); + } tableBlockIndexUniqueIdentifiers = BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment); segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers); } return tableBlockIndexUniqueIdentifiers; } + private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable) + throws IOException { + SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl(); + Map<String, String> indexFiles = segment.getCommittedIndexFile(); + for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) { + Path indexFile = new Path(indexFileEntry.getKey()); --- End diff -- It seems it's unnecessary to new a path object. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2299#discussion_r187768121 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java --- @@ -297,7 +297,7 @@ public TableDataMap registerDataMap(CarbonTable table, dataMapSchema, dataMapFactory, blockletDetailsFetcher, segmentPropertiesFetcher); tableIndices.add(dataMap); - allDataMaps.put(tableUniqueName, tableIndices); + allDataMaps.put(tableUniqueName.toLowerCase(), tableIndices); --- End diff -- The name is already lowercased here, no need to convert it here. If it is not, there maybe other bugs that cause it. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2299#discussion_r187768333 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java --- @@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) { Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers = segmentMap.get(segment.getSegmentNo()); if (tableBlockIndexUniqueIdentifiers == null) { + CarbonTable carbonTable = this.getCarbonTable(); + if (!carbonTable.getTableInfo().isTransactionalTable()) { + // For NonTransactional table, compare the schema of all index files with inferred schema. + // If there is a mismatch throw exception. As all files must be of same schema. + validateSchemaForNewTranscationalTableFiles(segment, carbonTable); + } tableBlockIndexUniqueIdentifiers = BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment); segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers); } return tableBlockIndexUniqueIdentifiers; } + private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable) + throws IOException { + SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl(); + Map<String, String> indexFiles = segment.getCommittedIndexFile(); + for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) { + Path indexFile = new Path(indexFileEntry.getKey()); + org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile( + indexFile.toString(), carbonTable.getTableName()); + TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo( + tableInfo, identifier.getDatabaseName(), + identifier.getTableName(), + identifier.getTablePath()); + List<ColumnSchema> indexFileColumnList = + wrapperTableInfo.getFactTable().getListOfColumns(); + List<ColumnSchema> tableColumnList = + carbonTable.getTableInfo().getFactTable().getListOfColumns(); + if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) { + LOG.error("Schema of " + indexFile.getName() + + " doesn't match with the table's schema"); + throw new IOException("All the files doesn't have same schema. " + + "Unsupported operation on nonTransactional table. Check logs."); + } + } + } + + private boolean compareColumnSchemaList(List<ColumnSchema> indexFileColumnList, --- End diff -- please optimize the method name, such as `isColumnSchemaSame` --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2299#discussion_r187768312 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java --- @@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) { Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers = segmentMap.get(segment.getSegmentNo()); if (tableBlockIndexUniqueIdentifiers == null) { + CarbonTable carbonTable = this.getCarbonTable(); + if (!carbonTable.getTableInfo().isTransactionalTable()) { + // For NonTransactional table, compare the schema of all index files with inferred schema. + // If there is a mismatch throw exception. As all files must be of same schema. + validateSchemaForNewTranscationalTableFiles(segment, carbonTable); + } tableBlockIndexUniqueIdentifiers = BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment); segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers); } return tableBlockIndexUniqueIdentifiers; } + private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable) + throws IOException { + SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl(); + Map<String, String> indexFiles = segment.getCommittedIndexFile(); + for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) { + Path indexFile = new Path(indexFileEntry.getKey()); + org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile( + indexFile.toString(), carbonTable.getTableName()); + TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo( + tableInfo, identifier.getDatabaseName(), + identifier.getTableName(), + identifier.getTablePath()); + List<ColumnSchema> indexFileColumnList = + wrapperTableInfo.getFactTable().getListOfColumns(); + List<ColumnSchema> tableColumnList = + carbonTable.getTableInfo().getFactTable().getListOfColumns(); + if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) { + LOG.error("Schema of " + indexFile.getName() + + " doesn't match with the table's schema"); + throw new IOException("All the files doesn't have same schema. " + + "Unsupported operation on nonTransactional table. Check logs."); + } + } + } + + private boolean compareColumnSchemaList(List<ColumnSchema> indexFileColumnList, + List<ColumnSchema> tableColumnList) { + if (indexFileColumnList.size() != tableColumnList.size()) { + return false; --- End diff -- Can you add a log here and line170 to tell the reason. --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2299#discussion_r187842996 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java --- @@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) { Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers = segmentMap.get(segment.getSegmentNo()); if (tableBlockIndexUniqueIdentifiers == null) { + CarbonTable carbonTable = this.getCarbonTable(); + if (!carbonTable.getTableInfo().isTransactionalTable()) { + // For NonTransactional table, compare the schema of all index files with inferred schema. + // If there is a mismatch throw exception. As all files must be of same schema. + validateSchemaForNewTranscationalTableFiles(segment, carbonTable); + } tableBlockIndexUniqueIdentifiers = BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment); segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers); } return tableBlockIndexUniqueIdentifiers; } + private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable) + throws IOException { + SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl(); + Map<String, String> indexFiles = segment.getCommittedIndexFile(); + for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) { + Path indexFile = new Path(indexFileEntry.getKey()); + org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile( + indexFile.toString(), carbonTable.getTableName()); + TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo( + tableInfo, identifier.getDatabaseName(), + identifier.getTableName(), + identifier.getTablePath()); + List<ColumnSchema> indexFileColumnList = + wrapperTableInfo.getFactTable().getListOfColumns(); + List<ColumnSchema> tableColumnList = + carbonTable.getTableInfo().getFactTable().getListOfColumns(); + if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) { + LOG.error("Schema of " + indexFile.getName() + + " doesn't match with the table's schema"); + throw new IOException("All the files doesn't have same schema. " + + "Unsupported operation on nonTransactional table. Check logs."); + } + } + } + + private boolean compareColumnSchemaList(List<ColumnSchema> indexFileColumnList, --- End diff -- 1) Add validation logic the place where read is happening. otherwise files will be read 2 two times. 2) All the blocklet index loading distribution logics also will work fine if handled while reading. --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2299#discussion_r187843922 --- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java --- @@ -401,6 +401,11 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, } else if (actualDataType == DataTypes.LONG) { return ByteUtil.toBytes((Long) dimensionValue); } else if (actualDataType == DataTypes.TIMESTAMP) { + if (dimensionValue instanceof String) { --- End diff -- Avro supports timestamp as logical type. So need not support this, as avro supports specific data type --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2299 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4740/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2299 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5894/ --- |
Free forum by Nabble | Edit this page |