[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

qiuchenjian-2
GitHub user ajantha-bhat opened a pull request:

    https://github.com/apache/carbondata/pull/2982

    [CARBONDATA-3158] support presto-carbon to read sdk cabron files

    **problem:**
    Currently, carbon SDK files output (files without metadata folder and its
    contents) are read by spark using an external table with carbon session.
    But presto carbon integration doesn't support that. It can currently read
    only the transactional table output files.
   
    **solution:**
    Hence we can enhance presto to read SDK output files. This will increase
    the use cases for presto-carbon integration.
   
    The above scenario can be achieved by inferring schema if metadata folder
    not exists and
    setting read committed scope to LatestFilesReadCommittedScope, if
    non-transctional table output files are present.
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed? NA
     
     - [ ] Any backward compatibility impacted? NA
     
     - [ ] Document update required? NA
   
     - [ ] Testing done.  
            yes, added UT      
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajantha-bhat/carbondata presto

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2982.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2982
   
----
commit e54253eff4546945a8c5a2543ac5ed5aa12df85b
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-12-07T13:07:10Z

    presto with sdk

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2982
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1681/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2982
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1893/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2982
 
    Build Failed  with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9941/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2982#discussion_r240229604
 
    --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java ---
    @@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) {
           String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/"
               + carbonTableIdentifier.getTableName();
     
    -      //Step 2: read the metadata (tableInfo) of the table.
    -      ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
    -        // TBase is used to read and write thrift objects.
    -        // TableInfo is a kind of TBase used to read and write table information.
    -        // TableInfo is generated by thrift,
    -        // see schema.thrift under format/src/main/thrift for details.
    -        public TBase create() {
    -          return new org.apache.carbondata.format.TableInfo();
    +      String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath);
    +      boolean isTransactionalTable = false;
    +      try {
    +        if (FileFactory.getCarbonFile(metadataPath)
    --- End diff --
   
    The operator of getFileType is called twice in getCarbonFile and if(...), it's better to getting it ahead
    FileType fileType = FileFactory.getFileType(metadataPath);
    if (FileFactory.getCarbonFile(metadataPath, fileType).isFileExist(metadataPath,fileType)) {...}


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2982#discussion_r240233846
 
    --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java ---
    @@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) {
           String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/"
               + carbonTableIdentifier.getTableName();
     
    -      //Step 2: read the metadata (tableInfo) of the table.
    -      ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
    -        // TBase is used to read and write thrift objects.
    -        // TableInfo is a kind of TBase used to read and write table information.
    -        // TableInfo is generated by thrift,
    -        // see schema.thrift under format/src/main/thrift for details.
    -        public TBase create() {
    -          return new org.apache.carbondata.format.TableInfo();
    +      String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath);
    +      boolean isTransactionalTable = false;
    +      try {
    +        if (FileFactory.getCarbonFile(metadataPath)
    +            .isFileExist(metadataPath, FileFactory.getFileType(metadataPath))) {
    +          // If metadata folder exists, it is a transactional table
    +          isTransactionalTable = true;
             }
    -      };
    -      ThriftReader thriftReader =
    -          new ThriftReader(CarbonTablePath.getSchemaFilePath(tablePath), createTBase);
    -      thriftReader.open();
    -      org.apache.carbondata.format.TableInfo tableInfo =
    -          (org.apache.carbondata.format.TableInfo) thriftReader.read();
    -      thriftReader.close();
    -
    +      } catch (IOException e) {
    +        throw new RuntimeException(e);
    +      }
    +      org.apache.carbondata.format.TableInfo tableInfo;
    +      if (isTransactionalTable) {
    +        //Step 2: read the metadata (tableInfo) of the table.
    +        ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
    +          // TBase is used to read and write thrift objects.
    +          // TableInfo is a kind of TBase used to read and write table information.
    +          // TableInfo is generated by thrift,
    +          // see schema.thrift under format/src/main/thrift for details.
    +          public TBase create() {
    +            return new org.apache.carbondata.format.TableInfo();
    +          }
    +        };
    +        ThriftReader thriftReader =
    +            new ThriftReader(CarbonTablePath.getSchemaFilePath(tablePath), createTBase);
    +        thriftReader.open();
    +        tableInfo = (org.apache.carbondata.format.TableInfo) thriftReader.read();
    +        thriftReader.close();
    +      } else {
    +        tableInfo =
    +            CarbonUtil.inferSchema(tablePath, table.getTableName(), false, new Configuration());
    --- End diff --
   
    Is this code (tableInfo = CarbonUtil.inferSchema(tablePath, table.getTableName(), false, new Configuration());)  tested on hdfs,  FileSystem may be not created by "new Configuration()"


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2982#discussion_r240475287
 
    --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java ---
    @@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) {
           String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/"
               + carbonTableIdentifier.getTableName();
     
    -      //Step 2: read the metadata (tableInfo) of the table.
    -      ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
    -        // TBase is used to read and write thrift objects.
    -        // TableInfo is a kind of TBase used to read and write table information.
    -        // TableInfo is generated by thrift,
    -        // see schema.thrift under format/src/main/thrift for details.
    -        public TBase create() {
    -          return new org.apache.carbondata.format.TableInfo();
    +      String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath);
    +      boolean isTransactionalTable = false;
    +      try {
    +        if (FileFactory.getCarbonFile(metadataPath)
    +            .isFileExist(metadataPath, FileFactory.getFileType(metadataPath))) {
    +          // If metadata folder exists, it is a transactional table
    +          isTransactionalTable = true;
             }
    -      };
    -      ThriftReader thriftReader =
    -          new ThriftReader(CarbonTablePath.getSchemaFilePath(tablePath), createTBase);
    -      thriftReader.open();
    -      org.apache.carbondata.format.TableInfo tableInfo =
    -          (org.apache.carbondata.format.TableInfo) thriftReader.read();
    -      thriftReader.close();
    -
    +      } catch (IOException e) {
    +        throw new RuntimeException(e);
    +      }
    +      org.apache.carbondata.format.TableInfo tableInfo;
    +      if (isTransactionalTable) {
    +        //Step 2: read the metadata (tableInfo) of the table.
    +        ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
    +          // TBase is used to read and write thrift objects.
    +          // TableInfo is a kind of TBase used to read and write table information.
    +          // TableInfo is generated by thrift,
    +          // see schema.thrift under format/src/main/thrift for details.
    +          public TBase create() {
    +            return new org.apache.carbondata.format.TableInfo();
    +          }
    +        };
    +        ThriftReader thriftReader =
    +            new ThriftReader(CarbonTablePath.getSchemaFilePath(tablePath), createTBase);
    +        thriftReader.open();
    +        tableInfo = (org.apache.carbondata.format.TableInfo) thriftReader.read();
    +        thriftReader.close();
    +      } else {
    +        tableInfo =
    +            CarbonUtil.inferSchema(tablePath, table.getTableName(), false, new Configuration());
    --- End diff --
   
    I have tested. It works.
   
    but better to use
    FileFactory.getConfiguration(). I will change to it,


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2982#discussion_r240477368
 
    --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java ---
    @@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) {
           String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/"
               + carbonTableIdentifier.getTableName();
     
    -      //Step 2: read the metadata (tableInfo) of the table.
    -      ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
    -        // TBase is used to read and write thrift objects.
    -        // TableInfo is a kind of TBase used to read and write table information.
    -        // TableInfo is generated by thrift,
    -        // see schema.thrift under format/src/main/thrift for details.
    -        public TBase create() {
    -          return new org.apache.carbondata.format.TableInfo();
    +      String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath);
    +      boolean isTransactionalTable = false;
    +      try {
    +        if (FileFactory.getCarbonFile(metadataPath)
    --- End diff --
   
    hmm. ok.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2982
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1697/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2982
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1907/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2982
 
    @ravipesala , @jackylk : PR is ready. please review


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2982
 
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9957/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2982#discussion_r241282108
 
    --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java ---
    @@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) {
           String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/"
               + carbonTableIdentifier.getTableName();
     
    -      //Step 2: read the metadata (tableInfo) of the table.
    -      ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
    -        // TBase is used to read and write thrift objects.
    -        // TableInfo is a kind of TBase used to read and write table information.
    -        // TableInfo is generated by thrift,
    -        // see schema.thrift under format/src/main/thrift for details.
    -        public TBase create() {
    -          return new org.apache.carbondata.format.TableInfo();
    +      String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath);
    +      boolean isTransactionalTable = false;
    +      try {
    +        FileFactory.FileType fileType = FileFactory.getFileType(metadataPath);
    +        if (FileFactory.getCarbonFile(metadataPath, fileType).isFileExist(metadataPath, fileType)) {
    --- End diff --
   
    No need to pass filetype here


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2982
 
    LGTM , just a minor comment.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2982#discussion_r241286067
 
    --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java ---
    @@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) {
           String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/"
               + carbonTableIdentifier.getTableName();
     
    -      //Step 2: read the metadata (tableInfo) of the table.
    -      ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
    -        // TBase is used to read and write thrift objects.
    -        // TableInfo is a kind of TBase used to read and write table information.
    -        // TableInfo is generated by thrift,
    -        // see schema.thrift under format/src/main/thrift for details.
    -        public TBase create() {
    -          return new org.apache.carbondata.format.TableInfo();
    +      String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath);
    +      boolean isTransactionalTable = false;
    +      try {
    +        FileFactory.FileType fileType = FileFactory.getFileType(metadataPath);
    +        if (FileFactory.getCarbonFile(metadataPath, fileType).isFileExist(metadataPath, fileType)) {
    --- End diff --
   
    currently, this reduces one function call, If I don't pass file type implicitly again they call this method. Let it be there now.
   
    when we remove filetype from file factory in the future, this also will be optimized.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2982


---