Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

Classic

List

16 messages Options

Options

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

GitHub user ajantha-bhat opened a pull request:

https://github.com/apache/carbondata/pull/2982

[CARBONDATA-3158] support presto-carbon to read sdk cabron files

**problem:**
Currently, carbon SDK files output (files without metadata folder and its
contents) are read by spark using an external table with carbon session.
But presto carbon integration doesn't support that. It can currently read
only the transactional table output files.

**solution:**
Hence we can enhance presto to read SDK output files. This will increase
the use cases for presto-carbon integration.

The above scenario can be achieved by inferring schema if metadata folder
not exists and
setting read committed scope to LatestFilesReadCommittedScope, if
non-transctional table output files are present.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed? NA

- [ ] Any backward compatibility impacted? NA

- [ ] Document update required? NA

- [ ] Testing done.
yes, added UT
- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajantha-bhat/carbondata presto

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2982.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2982

----
commit e54253eff4546945a8c5a2543ac5ed5aa12df85b
Author: ajantha-bhat <ajanthabhat@...>
Date: 2018-12-07T13:07:10Z

presto with sdk

----

---

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2982

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1681/

---

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2982

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1893/

---

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2982

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9941/

---

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

In reply to this post by qiuchenjian-2

Github user qiuchenjian commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2982#discussion_r240229604

--- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java ---
@@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) {
String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/"
+ carbonTableIdentifier.getTableName();

- //Step 2: read the metadata (tableInfo) of the table.
- ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
- // TBase is used to read and write thrift objects.
- // TableInfo is a kind of TBase used to read and write table information.
- // TableInfo is generated by thrift,
- // see schema.thrift under format/src/main/thrift for details.
- public TBase create() {
- return new org.apache.carbondata.format.TableInfo();
+ String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath);
+ boolean isTransactionalTable = false;
+ try {
+ if (FileFactory.getCarbonFile(metadataPath)
--- End diff --

The operator of getFileType is called twice in getCarbonFile and if(...), it's better to getting it ahead
FileType fileType = FileFactory.getFileType(metadataPath);
if (FileFactory.getCarbonFile(metadataPath, fileType).isFileExist(metadataPath,fileType)) {...}

---

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

In reply to this post by qiuchenjian-2

Github user qiuchenjian commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2982#discussion_r240233846

--- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java ---
@@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) {
String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/"
+ carbonTableIdentifier.getTableName();

- //Step 2: read the metadata (tableInfo) of the table.
- ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
- // TBase is used to read and write thrift objects.
- // TableInfo is a kind of TBase used to read and write table information.
- // TableInfo is generated by thrift,
- // see schema.thrift under format/src/main/thrift for details.
- public TBase create() {
- return new org.apache.carbondata.format.TableInfo();
+ String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath);
+ boolean isTransactionalTable = false;
+ try {
+ if (FileFactory.getCarbonFile(metadataPath)
+ .isFileExist(metadataPath, FileFactory.getFileType(metadataPath))) {
+ // If metadata folder exists, it is a transactional table
+ isTransactionalTable = true;
}
- };
- ThriftReader thriftReader =
- new ThriftReader(CarbonTablePath.getSchemaFilePath(tablePath), createTBase);
- thriftReader.open();
- org.apache.carbondata.format.TableInfo tableInfo =
- (org.apache.carbondata.format.TableInfo) thriftReader.read();
- thriftReader.close();
-
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ org.apache.carbondata.format.TableInfo tableInfo;
+ if (isTransactionalTable) {
+ //Step 2: read the metadata (tableInfo) of the table.
+ ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
+ // TBase is used to read and write thrift objects.
+ // TableInfo is a kind of TBase used to read and write table information.
+ // TableInfo is generated by thrift,
+ // see schema.thrift under format/src/main/thrift for details.
+ public TBase create() {
+ return new org.apache.carbondata.format.TableInfo();
+ }
+ };
+ ThriftReader thriftReader =
+ new ThriftReader(CarbonTablePath.getSchemaFilePath(tablePath), createTBase);
+ thriftReader.open();
+ tableInfo = (org.apache.carbondata.format.TableInfo) thriftReader.read();
+ thriftReader.close();
+ } else {
+ tableInfo =
+ CarbonUtil.inferSchema(tablePath, table.getTableName(), false, new Configuration());
--- End diff --

Is this code (tableInfo = CarbonUtil.inferSchema(tablePath, table.getTableName(), false, new Configuration());) tested on hdfs, FileSystem may be not created by "new Configuration()"

---

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2982#discussion_r240475287

--- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java ---
@@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) {
String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/"
+ carbonTableIdentifier.getTableName();

- //Step 2: read the metadata (tableInfo) of the table.
- ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
- // TBase is used to read and write thrift objects.
- // TableInfo is a kind of TBase used to read and write table information.
- // TableInfo is generated by thrift,
- // see schema.thrift under format/src/main/thrift for details.
- public TBase create() {
- return new org.apache.carbondata.format.TableInfo();
+ String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath);
+ boolean isTransactionalTable = false;
+ try {
+ if (FileFactory.getCarbonFile(metadataPath)
+ .isFileExist(metadataPath, FileFactory.getFileType(metadataPath))) {
+ // If metadata folder exists, it is a transactional table
+ isTransactionalTable = true;
}
- };
- ThriftReader thriftReader =
- new ThriftReader(CarbonTablePath.getSchemaFilePath(tablePath), createTBase);
- thriftReader.open();
- org.apache.carbondata.format.TableInfo tableInfo =
- (org.apache.carbondata.format.TableInfo) thriftReader.read();
- thriftReader.close();
-
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ org.apache.carbondata.format.TableInfo tableInfo;
+ if (isTransactionalTable) {
+ //Step 2: read the metadata (tableInfo) of the table.
+ ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
+ // TBase is used to read and write thrift objects.
+ // TableInfo is a kind of TBase used to read and write table information.
+ // TableInfo is generated by thrift,
+ // see schema.thrift under format/src/main/thrift for details.
+ public TBase create() {
+ return new org.apache.carbondata.format.TableInfo();
+ }
+ };
+ ThriftReader thriftReader =
+ new ThriftReader(CarbonTablePath.getSchemaFilePath(tablePath), createTBase);
+ thriftReader.open();
+ tableInfo = (org.apache.carbondata.format.TableInfo) thriftReader.read();
+ thriftReader.close();
+ } else {
+ tableInfo =
+ CarbonUtil.inferSchema(tablePath, table.getTableName(), false, new Configuration());
--- End diff --

I have tested. It works.

but better to use
FileFactory.getConfiguration(). I will change to it,

---

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2982#discussion_r240477368

--- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java ---
@@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) {
String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/"
+ carbonTableIdentifier.getTableName();

- //Step 2: read the metadata (tableInfo) of the table.
- ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
- // TBase is used to read and write thrift objects.
- // TableInfo is a kind of TBase used to read and write table information.
- // TableInfo is generated by thrift,
- // see schema.thrift under format/src/main/thrift for details.
- public TBase create() {
- return new org.apache.carbondata.format.TableInfo();
+ String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath);
+ boolean isTransactionalTable = false;
+ try {
+ if (FileFactory.getCarbonFile(metadataPath)
--- End diff --

hmm. ok.

---

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2982

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1697/

---

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2982

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1907/

---

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2982

@ravipesala , @jackylk : PR is ready. please review

---

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2982

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9957/

---

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2982#discussion_r241282108

--- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java ---
@@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) {
String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/"
+ carbonTableIdentifier.getTableName();

- //Step 2: read the metadata (tableInfo) of the table.
- ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
- // TBase is used to read and write thrift objects.
- // TableInfo is a kind of TBase used to read and write table information.
- // TableInfo is generated by thrift,
- // see schema.thrift under format/src/main/thrift for details.
- public TBase create() {
- return new org.apache.carbondata.format.TableInfo();
+ String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath);
+ boolean isTransactionalTable = false;
+ try {
+ FileFactory.FileType fileType = FileFactory.getFileType(metadataPath);
+ if (FileFactory.getCarbonFile(metadataPath, fileType).isFileExist(metadataPath, fileType)) {
--- End diff --

No need to pass filetype here

---

[GitHub] carbondata issue #2982: [CARBONDATA-3158] support presto-carbon to read sdk ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2982

LGTM , just a minor comment.

---

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2982#discussion_r241286067

--- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java ---
@@ -364,23 +355,38 @@ private CarbonTable parseCarbonMetadata(SchemaTableName table) {
String tablePath = storePath + "/" + carbonTableIdentifier.getDatabaseName() + "/"
+ carbonTableIdentifier.getTableName();

- //Step 2: read the metadata (tableInfo) of the table.
- ThriftReader.TBaseCreator createTBase = new ThriftReader.TBaseCreator() {
- // TBase is used to read and write thrift objects.
- // TableInfo is a kind of TBase used to read and write table information.
- // TableInfo is generated by thrift,
- // see schema.thrift under format/src/main/thrift for details.
- public TBase create() {
- return new org.apache.carbondata.format.TableInfo();
+ String metadataPath = CarbonTablePath.getSchemaFilePath(tablePath);
+ boolean isTransactionalTable = false;
+ try {
+ FileFactory.FileType fileType = FileFactory.getFileType(metadataPath);
+ if (FileFactory.getCarbonFile(metadataPath, fileType).isFileExist(metadataPath, fileType)) {
--- End diff --

currently, this reduces one function call, If I don't pass file type implicitly again they call this method. Let it be there now.

when we remove filetype from file factory in the future, this also will be optimized.

---

[GitHub] carbondata pull request #2982: [CARBONDATA-3158] support presto-carbon to re...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2982

---