Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

Classic

List

27 messages Options

Options

12

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

GitHub user allwefantasy opened a pull request:

https://github.com/apache/incubator-carbondata/pull/368

[CARBONDATA-465] Spark streaming dataframe support

* mvn clean verify have already been passed locally.
* No new unit test cases are added
* Tested in streamingpro project.
* Remove kettle clearly
* Fix NLP of loading data with new flow when sometimes spark executor will use TaskContext.get but the code run in another thread

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/allwefantasy/incubator-carbondata spark-streaming-dataframe-support2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/368.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #368

----

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/368#discussion_r90194407

--- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
@@ -763,10 +763,9 @@ case class LoadTable(

val columinar = sqlContext.getConf("carbon.is.columnar.storage", "true").toBoolean
- val kettleHomePath = CarbonScalaUtil.getKettleHome(sqlContext)

// TODO It will be removed after kettle is removed.
- val useKettle = options.get("use_kettle") match {
+ val useKettle = options.get("useKettle") match {
--- End diff --

why changed the string? better avoid changing it, it is used in other places as well

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/368#discussion_r90195041

--- Diff: integration/spark/src/main/scala/org/apache/spark/sql/hive/CarbonMetastoreCatalog.scala ---
@@ -307,10 +307,17 @@ class CarbonMetastoreCatalog(hiveContext: HiveContext, val storePath: String,
if (!FileFactory.isFileExist(schemaMetadataPath, fileType)) {
FileFactory.mkdirs(schemaMetadataPath, fileType)
}
+
+ /**
+ * schemaFilePath starts with file:// will not create meta files successfully
+ * while thriftWriter will have no complains.
+ * This will cause some weired error eg. No table found.
+ */
--- End diff --

Better don't keep comment like this. Please raise jira and provide steps to reproduce it.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/368#discussion_r90197023

--- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
@@ -604,4 +606,33 @@ public static boolean isHeaderValid(String tableName, String header,
}
return dateformatsHashMap;
}
+
--- End diff --

This reflection code is not required. Please check how load happens through dataframe in CarbonDataLoadRDD. I guess you can make use of same code.
And your iterator supposed to extend `InputIterator` of PR33.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

In reply to this post by qiuchenjian-2

Github user allwefantasy commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/368#discussion_r90228003

--- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
@@ -763,10 +763,9 @@ case class LoadTable(

val columinar = sqlContext.getConf("carbon.is.columnar.storage", "true").toBoolean
- val kettleHomePath = CarbonScalaUtil.getKettleHome(sqlContext)

// TODO It will be removed after kettle is removed.
- val useKettle = options.get("use_kettle") match {
+ val useKettle = options.get("useKettle") match {
--- End diff --

DataFrameWriter accept parameter `useKettle` eg. writer.option("useKettle", "false").option("tempCSV","false"). Maybe we should keep consistentã

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

In reply to this post by qiuchenjian-2

Github user allwefantasy commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/368#discussion_r90228110

--- Diff: integration/spark/src/main/scala/org/apache/spark/sql/hive/CarbonMetastoreCatalog.scala ---
@@ -307,10 +307,17 @@ class CarbonMetastoreCatalog(hiveContext: HiveContext, val storePath: String,
if (!FileFactory.isFileExist(schemaMetadataPath, fileType)) {
FileFactory.mkdirs(schemaMetadataPath, fileType)
}
+
+ /**
+ * schemaFilePath starts with file:// will not create meta files successfully
+ * while thriftWriter will have no complains.
+ * This will cause some weired error eg. No table found.
+ */
--- End diff --

Ok .

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

In reply to this post by qiuchenjian-2

Github user allwefantasy commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/368#discussion_r90229415

--- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
@@ -604,4 +606,33 @@ public static boolean isHeaderValid(String tableName, String header,
}
return dateformatsHashMap;
}
+
--- End diff --

Module carbon-processing do not depends on spark or other computing engine however there are some class need multi-thread to load data which runs as computing engine's task which need get TaskContext using ThreadLocal tech.

Yes, my first PR is merged from your PR333,but it's not merged to master yet.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/368#discussion_r90605596

--- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
@@ -604,4 +606,33 @@ public static boolean isHeaderValid(String tableName, String header,
}
return dateformatsHashMap;
}
+
--- End diff --

PR333 is merged, please rebase

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

In reply to this post by qiuchenjian-2

Github user allwefantasy commented on the issue:

https://github.com/apache/incubator-carbondata/pull/368

@jackylk fix confict. Can you review it?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/incubator-carbondata/pull/368

@allwefantasy it seems there is a problem with merging/rebasing the PR, other commits has come here. Please fix it.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

In reply to this post by qiuchenjian-2

Github user allwefantasy commented on the issue:

https://github.com/apache/incubator-carbondata/pull/368

It's weird. In my local branch, git log shows:

```
commit acdf78a8cba4f7c18cbaaf0fcc1a9e9dc3189068
Merge: 8a21cb7 5ca7218
Author: WilliamZhu <[hidden email]>
Date: Mon Dec 5 11:30:35 2016 +0800

Merge branch 'spark-streaming-dataframe-support2' of github.com:allwefantasy/incubator-carbondata into spark-streaming-dataframe-support2

commit 8a21cb715eac50c04b859530ab459ae9b6f226a3
Author: WilliamZhu <[hidden email]>
Date: Wed Nov 30 21:24:33 2016 +0800

remove comments on createTableFromThrift and rais jira later

commit 06bc4239a2762a6f27da99982b47e880d6a1be4c
Author: WilliamZhu <[hidden email]>
Date: Wed Nov 30 00:12:24 2016 +0800

reset maven-source-plugin

commit 0f042797f54143bd473296bc33650e84d071dd15
Author: WilliamZhu <[hidden email]>
Date: Tue Nov 29 23:46:42 2016 +0800

spark streaming dataframe support

commit 70ae82045e461c740cf2ae80c2058160bc9855a9
Merge: e7958b6 fc3f6b3
Author: ravipesala <[hidden email]>
```

I will try to figure out

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

In reply to this post by qiuchenjian-2

Github user allwefantasy commented on the issue:

https://github.com/apache/incubator-carbondata/pull/368

The commit log shows Changes allwefantasy and others added some commits 5 days ago. I guess there is no problem @ravipesala

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

In reply to this post by qiuchenjian-2

Github user allwefantasy commented on the issue:

https://github.com/apache/incubator-carbondata/pull/368

Yes It seems not ok...... I will try to figure out how to resolve this.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/incubator-carbondata/pull/368

Can one of the admins verify this patch?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

In reply to this post by qiuchenjian-2

Github user allwefantasy closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/368

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

In reply to this post by qiuchenjian-2

GitHub user allwefantasy reopened a pull request:

https://github.com/apache/incubator-carbondata/pull/368

[CARBONDATA-465] Spark streaming dataframe support

* mvn clean verify have already been passed locally.
* No new unit test cases are added
* Tested in streamingpro project.
* Remove kettle clearly
* Fix NLP of loading data with new flow when sometimes spark executor will use TaskContext.get but the code run in another thread

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/allwefantasy/incubator-carbondata spark-streaming-dataframe-support2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/368.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #368

----
commit 44dd84ff48f023ec60a1e49ae5a6e70df387738b
Author: WilliamZhu <[hidden email]>
Date: 2016-12-06T04:39:22Z

merge from master

commit 9cf2442e2a7579547fd6a9147721c66499bf13d5
Author: WilliamZhu <[hidden email]>
Date: 2016-11-29T15:46:42Z

spark streaming dataframe support

commit 92f41ca41fc449528df9716dceec6f3c54e9535f
Author: WilliamZhu <[hidden email]>
Date: 2016-11-29T16:12:24Z

reset maven-source-plugin

commit b64c4a6fe87e638e935be1360e3b55956b01bee8
Author: WilliamZhu <[hidden email]>
Date: 2016-11-29T16:12:24Z

reset maven-source-plugin

commit 85adeb6b79460e873f4340dbf1c80e262e235bbd
Author: WilliamZhu <[hidden email]>
Date: 2016-11-30T13:24:33Z

remove comments on createTableFromThrift and rais jira later

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

In reply to this post by qiuchenjian-2

Github user allwefantasy commented on the issue:

https://github.com/apache/incubator-carbondata/pull/368

Fix commit log issue and update to latest

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/368

Can one of the admins verify this patch?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/incubator-carbondata/pull/368

add to whitelist

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/368

Build Success, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/35/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

12