GitHub user ajantha-bhat opened a pull request:
https://github.com/apache/carbondata/pull/2982 [CARBONDATA-3158] support presto-carbon to read sdk cabron files **problem:** Currently, carbon SDK files output (files without metadata folder and its contents) are read by spark using an external table with carbon session. But presto carbon integration doesn't support that. It can currently read only the transactional table output files. **solution:** Hence we can enhance presto to read SDK output files. This will increase the use cases for presto-carbon integration. The above scenario can be achieved by inferring schema if metadata folder not exists and setting read committed scope to LatestFilesReadCommittedScope, if non-transctional table output files are present. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done. yes, added UT - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata presto Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2982.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2982 ---- commit e54253eff4546945a8c5a2543ac5ed5aa12df85b Author: ajantha-bhat <ajanthabhat@...> Date: 2018-12-07T13:07:10Z presto with sdk ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2982 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1681/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2982 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1893/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2982 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9941/ --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2982#discussion_r240229604 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java --- @@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) { String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/" + carbonTableIdentifier.getTableName(); - //Step 2: read the metadata (tableInfo) of the table. - ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() { - // TBase is used to read and write thrift objects. - // TableInfo is a kind of TBase used to read and write table information. - // TableInfo is generated by thrift, - // see schema.thrift under format/src/main/thrift for details. - public TBase create() { - return new org.apache.carbondata.format.TableInfo(); + String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath); + boolean isTransactionalTable = false; + try { + if (FileFactory.getCarbonFile(metadataPath) --- End diff -- The operator of getFileType is called twice in getCarbonFile and if(...), it's better to getting it ahead FileType fileType = FileFactory.getFileType(metadataPath); if (FileFactory.getCarbonFile(metadataPath, fileType).isFileExist(metadataPath,fileType)) {...} --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2982#discussion_r240233846 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java --- @@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) { String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/" + carbonTableIdentifier.getTableName(); - //Step 2: read the metadata (tableInfo) of the table. - ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() { - // TBase is used to read and write thrift objects. - // TableInfo is a kind of TBase used to read and write table information. - // TableInfo is generated by thrift, - // see schema.thrift under format/src/main/thrift for details. - public TBase create() { - return new org.apache.carbondata.format.TableInfo(); + String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath); + boolean isTransactionalTable = false; + try { + if (FileFactory.getCarbonFile(metadataPath) + .isFileExist(metadataPath, FileFactory.getFileType(metadataPath))) { + // If metadata folder exists, it is a transactional table + isTransactionalTable = true; } - }; - ThriftReader thriftReader = - new ThriftReader(CarbonTablePath.getSchemaFilePath(tablePath), createTBase); - thriftReader.open(); - org.apache.carbondata.format.TableInfo tableInfo = - (org.apache.carbondata.format.TableInfo) thriftReader.read(); - thriftReader.close(); - + } catch (IOException e) { + throw new RuntimeException(e); + } + org.apache.carbondata.format.TableInfo tableInfo; + if (isTransactionalTable) { + //Step 2: read the metadata (tableInfo) of the table. + ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() { + // TBase is used to read and write thrift objects. + // TableInfo is a kind of TBase used to read and write table information. + // TableInfo is generated by thrift, + // see schema.thrift under format/src/main/thrift for details. + public TBase create() { + return new org.apache.carbondata.format.TableInfo(); + } + }; + ThriftReader thriftReader = + new ThriftReader(CarbonTablePath.getSchemaFilePath(tablePath), createTBase); + thriftReader.open(); + tableInfo = (org.apache.carbondata.format.TableInfo) thriftReader.read(); + thriftReader.close(); + } else { + tableInfo = + CarbonUtil.inferSchema(tablePath, table.getTableName(), false, new Configuration()); --- End diff -- Is this code (tableInfo = CarbonUtil.inferSchema(tablePath, table.getTableName(), false, new Configuration());) tested on hdfs, FileSystem may be not created by "new Configuration()" --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2982#discussion_r240475287 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java --- @@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) { String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/" + carbonTableIdentifier.getTableName(); - //Step 2: read the metadata (tableInfo) of the table. - ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() { - // TBase is used to read and write thrift objects. - // TableInfo is a kind of TBase used to read and write table information. - // TableInfo is generated by thrift, - // see schema.thrift under format/src/main/thrift for details. - public TBase create() { - return new org.apache.carbondata.format.TableInfo(); + String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath); + boolean isTransactionalTable = false; + try { + if (FileFactory.getCarbonFile(metadataPath) + .isFileExist(metadataPath, FileFactory.getFileType(metadataPath))) { + // If metadata folder exists, it is a transactional table + isTransactionalTable = true; } - }; - ThriftReader thriftReader = - new ThriftReader(CarbonTablePath.getSchemaFilePath(tablePath), createTBase); - thriftReader.open(); - org.apache.carbondata.format.TableInfo tableInfo = - (org.apache.carbondata.format.TableInfo) thriftReader.read(); - thriftReader.close(); - + } catch (IOException e) { + throw new RuntimeException(e); + } + org.apache.carbondata.format.TableInfo tableInfo; + if (isTransactionalTable) { + //Step 2: read the metadata (tableInfo) of the table. + ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() { + // TBase is used to read and write thrift objects. + // TableInfo is a kind of TBase used to read and write table information. + // TableInfo is generated by thrift, + // see schema.thrift under format/src/main/thrift for details. + public TBase create() { + return new org.apache.carbondata.format.TableInfo(); + } + }; + ThriftReader thriftReader = + new ThriftReader(CarbonTablePath.getSchemaFilePath(tablePath), createTBase); + thriftReader.open(); + tableInfo = (org.apache.carbondata.format.TableInfo) thriftReader.read(); + thriftReader.close(); + } else { + tableInfo = + CarbonUtil.inferSchema(tablePath, table.getTableName(), false, new Configuration()); --- End diff -- I have tested. It works. but better to use FileFactory.getConfiguration(). I will change to it, --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2982#discussion_r240477368 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java --- @@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) { String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/" + carbonTableIdentifier.getTableName(); - //Step 2: read the metadata (tableInfo) of the table. - ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() { - // TBase is used to read and write thrift objects. - // TableInfo is a kind of TBase used to read and write table information. - // TableInfo is generated by thrift, - // see schema.thrift under format/src/main/thrift for details. - public TBase create() { - return new org.apache.carbondata.format.TableInfo(); + String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath); + boolean isTransactionalTable = false; + try { + if (FileFactory.getCarbonFile(metadataPath) --- End diff -- hmm. ok. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2982 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1697/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2982 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1907/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2982 @ravipesala , @jackylk : PR is ready. please review --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2982 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9957/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2982#discussion_r241282108 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java --- @@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) { String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/" + carbonTableIdentifier.getTableName(); - //Step 2: read the metadata (tableInfo) of the table. - ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() { - // TBase is used to read and write thrift objects. - // TableInfo is a kind of TBase used to read and write table information. - // TableInfo is generated by thrift, - // see schema.thrift under format/src/main/thrift for details. - public TBase create() { - return new org.apache.carbondata.format.TableInfo(); + String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath); + boolean isTransactionalTable = false; + try { + FileFactory.FileType fileType = FileFactory.getFileType(metadataPath); + if (FileFactory.getCarbonFile(metadataPath, fileType).isFileExist(metadataPath, fileType)) { --- End diff -- No need to pass filetype here --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2982 LGTM , just a minor comment. --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2982#discussion_r241286067 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java --- @@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) { String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/" + carbonTableIdentifier.getTableName(); - //Step 2: read the metadata (tableInfo) of the table. - ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() { - // TBase is used to read and write thrift objects. - // TableInfo is a kind of TBase used to read and write table information. - // TableInfo is generated by thrift, - // see schema.thrift under format/src/main/thrift for details. - public TBase create() { - return new org.apache.carbondata.format.TableInfo(); + String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath); + boolean isTransactionalTable = false; + try { + FileFactory.FileType fileType = FileFactory.getFileType(metadataPath); + if (FileFactory.getCarbonFile(metadataPath, fileType).isFileExist(metadataPath, fileType)) { --- End diff -- currently, this reduces one function call, If I don't pass file type implicitly again they call this method. Let it be there now. when we remove filetype from file factory in the future, this also will be optimized. --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |