[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

qiuchenjian-2
GitHub user ajantha-bhat opened a pull request:

    https://github.com/apache/carbondata/pull/2299

    [CARBONDATA-2472] Fixed:NonTransactional table performance degarde issue induced by PR #2273

   
   
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajantha-bhat/carbondata unmanaged_table

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2299.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2299
   
----
commit 0846033a4ea91bf6e97a8a32dcbf853c2d98fdb0
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-05-11T08:58:46Z

    [CARBONDATA-2472] Fixed:NonTransactional table performance degarde issue induced by PR #2273

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2299
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2299
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5835/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2299
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4679/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2299
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2299
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4883/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2299
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4682/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2299
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5838/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2299
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4685/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2299
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5841/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:NonTransactional table perfo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2299
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4888/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2299#discussion_r187762940
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
    @@ -297,7 +297,7 @@ public TableDataMap registerDataMap(CarbonTable table,
             dataMapSchema, dataMapFactory, blockletDetailsFetcher, segmentPropertiesFetcher);
     
         tableIndices.add(dataMap);
    -    allDataMaps.put(tableUniqueName, tableIndices);
    +    allDataMaps.put(tableUniqueName.toLowerCase(), tableIndices);
    --- End diff --
   
    Is this a bug? It seems not related to this PR


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2299#discussion_r187768276
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) {
         Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers =
             segmentMap.get(segment.getSegmentNo());
         if (tableBlockIndexUniqueIdentifiers == null) {
    +      CarbonTable carbonTable = this.getCarbonTable();
    +      if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +        // For NonTransactional table, compare the schema of all index files with inferred schema.
    +        // If there is a mismatch throw exception. As all files must be of same schema.
    +        validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
    +      }
           tableBlockIndexUniqueIdentifiers =
               BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
           segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers);
         }
         return tableBlockIndexUniqueIdentifiers;
       }
     
    +  private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable)
    +      throws IOException {
    +    SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
    +    Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +    for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
    +      Path indexFile = new Path(indexFileEntry.getKey());
    --- End diff --
   
    It seems it's unnecessary to new a path object.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2299#discussion_r187768121
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
    @@ -297,7 +297,7 @@ public TableDataMap registerDataMap(CarbonTable table,
             dataMapSchema, dataMapFactory, blockletDetailsFetcher, segmentPropertiesFetcher);
     
         tableIndices.add(dataMap);
    -    allDataMaps.put(tableUniqueName, tableIndices);
    +    allDataMaps.put(tableUniqueName.toLowerCase(), tableIndices);
    --- End diff --
   
    The name is already lowercased here, no need to convert it here. If it is not, there maybe other bugs that cause it.



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2299#discussion_r187768333
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) {
         Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers =
             segmentMap.get(segment.getSegmentNo());
         if (tableBlockIndexUniqueIdentifiers == null) {
    +      CarbonTable carbonTable = this.getCarbonTable();
    +      if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +        // For NonTransactional table, compare the schema of all index files with inferred schema.
    +        // If there is a mismatch throw exception. As all files must be of same schema.
    +        validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
    +      }
           tableBlockIndexUniqueIdentifiers =
               BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
           segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers);
         }
         return tableBlockIndexUniqueIdentifiers;
       }
     
    +  private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable)
    +      throws IOException {
    +    SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
    +    Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +    for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
    +      Path indexFile = new Path(indexFileEntry.getKey());
    +      org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile(
    +          indexFile.toString(), carbonTable.getTableName());
    +      TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo(
    +          tableInfo, identifier.getDatabaseName(),
    +          identifier.getTableName(),
    +          identifier.getTablePath());
    +      List<ColumnSchema> indexFileColumnList =
    +          wrapperTableInfo.getFactTable().getListOfColumns();
    +      List<ColumnSchema> tableColumnList =
    +          carbonTable.getTableInfo().getFactTable().getListOfColumns();
    +      if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
    +        LOG.error("Schema of " + indexFile.getName()
    +            + " doesn't match with the table's schema");
    +        throw new IOException("All the files doesn't have same schema. "
    +            + "Unsupported operation on nonTransactional table. Check logs.");
    +      }
    +    }
    +  }
    +
    +  private boolean compareColumnSchemaList(List<ColumnSchema> indexFileColumnList,
    --- End diff --
   
    please optimize the method name, such as `isColumnSchemaSame`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2299#discussion_r187768312
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) {
         Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers =
             segmentMap.get(segment.getSegmentNo());
         if (tableBlockIndexUniqueIdentifiers == null) {
    +      CarbonTable carbonTable = this.getCarbonTable();
    +      if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +        // For NonTransactional table, compare the schema of all index files with inferred schema.
    +        // If there is a mismatch throw exception. As all files must be of same schema.
    +        validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
    +      }
           tableBlockIndexUniqueIdentifiers =
               BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
           segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers);
         }
         return tableBlockIndexUniqueIdentifiers;
       }
     
    +  private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable)
    +      throws IOException {
    +    SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
    +    Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +    for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
    +      Path indexFile = new Path(indexFileEntry.getKey());
    +      org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile(
    +          indexFile.toString(), carbonTable.getTableName());
    +      TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo(
    +          tableInfo, identifier.getDatabaseName(),
    +          identifier.getTableName(),
    +          identifier.getTablePath());
    +      List<ColumnSchema> indexFileColumnList =
    +          wrapperTableInfo.getFactTable().getListOfColumns();
    +      List<ColumnSchema> tableColumnList =
    +          carbonTable.getTableInfo().getFactTable().getListOfColumns();
    +      if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
    +        LOG.error("Schema of " + indexFile.getName()
    +            + " doesn't match with the table's schema");
    +        throw new IOException("All the files doesn't have same schema. "
    +            + "Unsupported operation on nonTransactional table. Check logs.");
    +      }
    +    }
    +  }
    +
    +  private boolean compareColumnSchemaList(List<ColumnSchema> indexFileColumnList,
    +      List<ColumnSchema> tableColumnList) {
    +    if (indexFileColumnList.size() != tableColumnList.size()) {
    +      return false;
    --- End diff --
   
    Can you add a log here and line170 to tell the reason.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2299#discussion_r187842996
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) {
         Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers =
             segmentMap.get(segment.getSegmentNo());
         if (tableBlockIndexUniqueIdentifiers == null) {
    +      CarbonTable carbonTable = this.getCarbonTable();
    +      if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +        // For NonTransactional table, compare the schema of all index files with inferred schema.
    +        // If there is a mismatch throw exception. As all files must be of same schema.
    +        validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
    +      }
           tableBlockIndexUniqueIdentifiers =
               BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
           segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers);
         }
         return tableBlockIndexUniqueIdentifiers;
       }
     
    +  private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable)
    +      throws IOException {
    +    SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
    +    Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +    for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
    +      Path indexFile = new Path(indexFileEntry.getKey());
    +      org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile(
    +          indexFile.toString(), carbonTable.getTableName());
    +      TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo(
    +          tableInfo, identifier.getDatabaseName(),
    +          identifier.getTableName(),
    +          identifier.getTablePath());
    +      List<ColumnSchema> indexFileColumnList =
    +          wrapperTableInfo.getFactTable().getListOfColumns();
    +      List<ColumnSchema> tableColumnList =
    +          carbonTable.getTableInfo().getFactTable().getListOfColumns();
    +      if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
    +        LOG.error("Schema of " + indexFile.getName()
    +            + " doesn't match with the table's schema");
    +        throw new IOException("All the files doesn't have same schema. "
    +            + "Unsupported operation on nonTransactional table. Check logs.");
    +      }
    +    }
    +  }
    +
    +  private boolean compareColumnSchemaList(List<ColumnSchema> indexFileColumnList,
    --- End diff --
   
    1) Add validation logic the place where read is happening. otherwise files will be read 2 two times.
    2) All the blocklet index loading distribution logics also will work fine if handled while reading.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2299#discussion_r187843922
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
    @@ -401,6 +401,11 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue,
         } else if (actualDataType == DataTypes.LONG) {
           return ByteUtil.toBytes((Long) dimensionValue);
         } else if (actualDataType == DataTypes.TIMESTAMP) {
    +      if (dimensionValue instanceof String) {
    --- End diff --
   
    Avro supports timestamp as logical type. So need not support this, as avro supports specific data type


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:Refactor NonTransactional ta...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2299
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4740/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2299: [CARBONDATA-2472] Fixed:Refactor NonTransactional ta...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2299
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5894/



---
12