Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

Classic

List

32 messages Options

Options

12

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

GitHub user ajantha-bhat opened a pull request:

https://github.com/apache/carbondata/pull/2299

[CARBONDATA-2472] Fixed:NonTransactional table performance degarde issue induced by PR #2273

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajantha-bhat/carbondata unmanaged_table

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2299.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2299

----
commit 0846033a4ea91bf6e97a8a32dcbf853c2d98fdb0
Author: ajantha-bhat <ajanthabhat@...>
Date: 2018-05-11T08:58:46Z

[CARBONDATA-2472] Fixed:NonTransactional table performance degarde issue induced by PR #2273

----

---

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2299

retest this please

---

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2299

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5835/

---

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2299

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4679/

---

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2299

retest this please

---

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2299

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4883/

---

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2299

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4682/

---

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2299

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5838/

---

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2299

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4685/

---

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2299

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5841/

---

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2299

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4888/

---

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187762940

--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
@@ -297,7 +297,7 @@ public TableDataMap registerDataMap(CarbonTable table,
dataMapSchema, dataMapFactory, blockletDetailsFetcher, segmentPropertiesFetcher);

tableIndices.add(dataMap);
- allDataMaps.put(tableUniqueName, tableIndices);
+ allDataMaps.put(tableUniqueName.toLowerCase(), tableIndices);
--- End diff --

Is this a bug? It seems not related to this PR

---

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187768276

--- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
@@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) {
Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers =
segmentMap.get(segment.getSegmentNo());
if (tableBlockIndexUniqueIdentifiers == null) {
+ CarbonTable carbonTable = this.getCarbonTable();
+ if (!carbonTable.getTableInfo().isTransactionalTable()) {
+ // For NonTransactional table, compare the schema of all index files with inferred schema.
+ // If there is a mismatch throw exception. As all files must be of same schema.
+ validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
+ }
tableBlockIndexUniqueIdentifiers =
BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers);
}
return tableBlockIndexUniqueIdentifiers;
}

+ private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable)
+ throws IOException {
+ SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
+ Map<String, String> indexFiles = segment.getCommittedIndexFile();
+ for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
+ Path indexFile = new Path(indexFileEntry.getKey());
--- End diff --

It seems it's unnecessary to new a path object.

---

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187768121

--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
@@ -297,7 +297,7 @@ public TableDataMap registerDataMap(CarbonTable table,
dataMapSchema, dataMapFactory, blockletDetailsFetcher, segmentPropertiesFetcher);

tableIndices.add(dataMap);
- allDataMaps.put(tableUniqueName, tableIndices);
+ allDataMaps.put(tableUniqueName.toLowerCase(), tableIndices);
--- End diff --

The name is already lowercased here, no need to convert it here. If it is not, there maybe other bugs that cause it.

---

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187768333

--- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
@@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) {
Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers =
segmentMap.get(segment.getSegmentNo());
if (tableBlockIndexUniqueIdentifiers == null) {
+ CarbonTable carbonTable = this.getCarbonTable();
+ if (!carbonTable.getTableInfo().isTransactionalTable()) {
+ // For NonTransactional table, compare the schema of all index files with inferred schema.
+ // If there is a mismatch throw exception. As all files must be of same schema.
+ validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
+ }
tableBlockIndexUniqueIdentifiers =
BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers);
}
return tableBlockIndexUniqueIdentifiers;
}

+ private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable)
+ throws IOException {
+ SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
+ Map<String, String> indexFiles = segment.getCommittedIndexFile();
+ for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
+ Path indexFile = new Path(indexFileEntry.getKey());
+ org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile(
+ indexFile.toString(), carbonTable.getTableName());
+ TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo(
+ tableInfo, identifier.getDatabaseName(),
+ identifier.getTableName(),
+ identifier.getTablePath());
+ List<ColumnSchema> indexFileColumnList =
+ wrapperTableInfo.getFactTable().getListOfColumns();
+ List<ColumnSchema> tableColumnList =
+ carbonTable.getTableInfo().getFactTable().getListOfColumns();
+ if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
+ LOG.error("Schema of " + indexFile.getName()
+ + " doesn't match with the table's schema");
+ throw new IOException("All the files doesn't have same schema. "
+ + "Unsupported operation on nonTransactional table. Check logs.");
+ }
+ }
+ }
+
+ private boolean compareColumnSchemaList(List<ColumnSchema> indexFileColumnList,
--- End diff --

please optimize the method name, such as `isColumnSchemaSame`

---

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187768312

--- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
@@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) {
Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers =
segmentMap.get(segment.getSegmentNo());
if (tableBlockIndexUniqueIdentifiers == null) {
+ CarbonTable carbonTable = this.getCarbonTable();
+ if (!carbonTable.getTableInfo().isTransactionalTable()) {
+ // For NonTransactional table, compare the schema of all index files with inferred schema.
+ // If there is a mismatch throw exception. As all files must be of same schema.
+ validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
+ }
tableBlockIndexUniqueIdentifiers =
BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers);
}
return tableBlockIndexUniqueIdentifiers;
}

+ private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable)
+ throws IOException {
+ SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
+ Map<String, String> indexFiles = segment.getCommittedIndexFile();
+ for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
+ Path indexFile = new Path(indexFileEntry.getKey());
+ org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile(
+ indexFile.toString(), carbonTable.getTableName());
+ TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo(
+ tableInfo, identifier.getDatabaseName(),
+ identifier.getTableName(),
+ identifier.getTablePath());
+ List<ColumnSchema> indexFileColumnList =
+ wrapperTableInfo.getFactTable().getListOfColumns();
+ List<ColumnSchema> tableColumnList =
+ carbonTable.getTableInfo().getFactTable().getListOfColumns();
+ if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
+ LOG.error("Schema of " + indexFile.getName()
+ + " doesn't match with the table's schema");
+ throw new IOException("All the files doesn't have same schema. "
+ + "Unsupported operation on nonTransactional table. Check logs.");
+ }
+ }
+ }
+
+ private boolean compareColumnSchemaList(List<ColumnSchema> indexFileColumnList,
+ List<ColumnSchema> tableColumnList) {
+ if (indexFileColumnList.size() != tableColumnList.size()) {
+ return false;
--- End diff --

Can you add a log here and line170 to tell the reason.

---

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

In reply to this post by qiuchenjian-2

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187842996

--- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
@@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) {
Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers =
segmentMap.get(segment.getSegmentNo());
if (tableBlockIndexUniqueIdentifiers == null) {
+ CarbonTable carbonTable = this.getCarbonTable();
+ if (!carbonTable.getTableInfo().isTransactionalTable()) {
+ // For NonTransactional table, compare the schema of all index files with inferred schema.
+ // If there is a mismatch throw exception. As all files must be of same schema.
+ validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
+ }
tableBlockIndexUniqueIdentifiers =
BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers);
}
return tableBlockIndexUniqueIdentifiers;
}

+ private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable)
+ throws IOException {
+ SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
+ Map<String, String> indexFiles = segment.getCommittedIndexFile();
+ for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
+ Path indexFile = new Path(indexFileEntry.getKey());
+ org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile(
+ indexFile.toString(), carbonTable.getTableName());
+ TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo(
+ tableInfo, identifier.getDatabaseName(),
+ identifier.getTableName(),
+ identifier.getTablePath());
+ List<ColumnSchema> indexFileColumnList =
+ wrapperTableInfo.getFactTable().getListOfColumns();
+ List<ColumnSchema> tableColumnList =
+ carbonTable.getTableInfo().getFactTable().getListOfColumns();
+ if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
+ LOG.error("Schema of " + indexFile.getName()
+ + " doesn't match with the table's schema");
+ throw new IOException("All the files doesn't have same schema. "
+ + "Unsupported operation on nonTransactional table. Check logs.");
+ }
+ }
+ }
+
+ private boolean compareColumnSchemaList(List<ColumnSchema> indexFileColumnList,
--- End diff --

1) Add validation logic the place where read is happening. otherwise files will be read 2 two times.
2) All the blocklet index loading distribution logics also will work fine if handled while reading.

---

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

In reply to this post by qiuchenjian-2

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187843922

--- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
@@ -401,6 +401,11 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue,
} else if (actualDataType == DataTypes.LONG) {
return ByteUtil.toBytes((Long) dimensionValue);
} else if (actualDataType == DataTypes.TIMESTAMP) {
+ if (dimensionValue instanceof String) {
--- End diff --

Avro supports timestamp as logical type. So need not support this, as avro supports specific data type

---

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:Refactor NonTransactional ta...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2299

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4740/

---

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:Refactor NonTransactional ta...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2299

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5894/

---

12