GitHub user allwefantasy opened a pull request:
https://github.com/apache/incubator-carbondata/pull/368 [CARBONDATA-465] Spark streaming dataframe support * mvn clean verify have already been passed locally. * No new unit test cases are added * Tested in streamingpro project. * Remove kettle clearly * Fix NLP of loading data with new flow when sometimes spark executor will use TaskContext.get but the code run in another thread You can merge this pull request into a Git repository by running: $ git pull https://github.com/allwefantasy/incubator-carbondata spark-streaming-dataframe-support2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/368.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #368 ---- ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/368#discussion_r90194407 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -763,10 +763,9 @@ case class LoadTable( val columinar = sqlContext.getConf("carbon.is.columnar.storage", "true").toBoolean - val kettleHomePath = CarbonScalaUtil.getKettleHome(sqlContext) // TODO It will be removed after kettle is removed. - val useKettle = options.get("use_kettle") match { + val useKettle = options.get("useKettle") match { --- End diff -- why changed the string? better avoid changing it, it is used in other places as well --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/368#discussion_r90195041 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/hive/CarbonMetastoreCatalog.scala --- @@ -307,10 +307,17 @@ class CarbonMetastoreCatalog(hiveContext: HiveContext, val storePath: String, if (!FileFactory.isFileExist(schemaMetadataPath, fileType)) { FileFactory.mkdirs(schemaMetadataPath, fileType) } + + /** + * schemaFilePath starts with file:// will not create meta files successfully + * while thriftWriter will have no complains. + * This will cause some weired error eg. No table found. + */ --- End diff -- Better don't keep comment like this. Please raise jira and provide steps to reproduce it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/368#discussion_r90197023 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -604,4 +606,33 @@ public static boolean isHeaderValid(String tableName, String header, } return dateformatsHashMap; } + --- End diff -- This reflection code is not required. Please check how load happens through dataframe in CarbonDataLoadRDD. I guess you can make use of same code. And your iterator supposed to extend `InputIterator` of PR33. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/368#discussion_r90228003 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -763,10 +763,9 @@ case class LoadTable( val columinar = sqlContext.getConf("carbon.is.columnar.storage", "true").toBoolean - val kettleHomePath = CarbonScalaUtil.getKettleHome(sqlContext) // TODO It will be removed after kettle is removed. - val useKettle = options.get("use_kettle") match { + val useKettle = options.get("useKettle") match { --- End diff -- DataFrameWriter accept parameter `useKettle` eg. writer.option("useKettle", "false").option("tempCSV","false"). Maybe we should keep consistentã --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/368#discussion_r90228110 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/hive/CarbonMetastoreCatalog.scala --- @@ -307,10 +307,17 @@ class CarbonMetastoreCatalog(hiveContext: HiveContext, val storePath: String, if (!FileFactory.isFileExist(schemaMetadataPath, fileType)) { FileFactory.mkdirs(schemaMetadataPath, fileType) } + + /** + * schemaFilePath starts with file:// will not create meta files successfully + * while thriftWriter will have no complains. + * This will cause some weired error eg. No table found. + */ --- End diff -- Ok . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/368#discussion_r90229415 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -604,4 +606,33 @@ public static boolean isHeaderValid(String tableName, String header, } return dateformatsHashMap; } + --- End diff -- Module carbon-processing do not depends on spark or other computing engine however there are some class need multi-thread to load data which runs as computing engine's task which need get TaskContext using ThreadLocal tech. Yes, my first PR is merged from your PR333,but it's not merged to master yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/368#discussion_r90605596 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -604,4 +606,33 @@ public static boolean isHeaderValid(String tableName, String header, } return dateformatsHashMap; } + --- End diff -- PR333 is merged, please rebase --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on the issue:
https://github.com/apache/incubator-carbondata/pull/368 @jackylk fix confict. Can you review it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/incubator-carbondata/pull/368 @allwefantasy it seems there is a problem with merging/rebasing the PR, other commits has come here. Please fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on the issue:
https://github.com/apache/incubator-carbondata/pull/368 It's weird. In my local branch, git log shows: ``` commit acdf78a8cba4f7c18cbaaf0fcc1a9e9dc3189068 Merge: 8a21cb7 5ca7218 Author: WilliamZhu <[hidden email]> Date: Mon Dec 5 11:30:35 2016 +0800 Merge branch 'spark-streaming-dataframe-support2' of github.com:allwefantasy/incubator-carbondata into spark-streaming-dataframe-support2 commit 8a21cb715eac50c04b859530ab459ae9b6f226a3 Author: WilliamZhu <[hidden email]> Date: Wed Nov 30 21:24:33 2016 +0800 remove comments on createTableFromThrift and rais jira later commit 06bc4239a2762a6f27da99982b47e880d6a1be4c Author: WilliamZhu <[hidden email]> Date: Wed Nov 30 00:12:24 2016 +0800 reset maven-source-plugin commit 0f042797f54143bd473296bc33650e84d071dd15 Author: WilliamZhu <[hidden email]> Date: Tue Nov 29 23:46:42 2016 +0800 spark streaming dataframe support commit 70ae82045e461c740cf2ae80c2058160bc9855a9 Merge: e7958b6 fc3f6b3 Author: ravipesala <[hidden email]> ``` I will try to figure out --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on the issue:
https://github.com/apache/incubator-carbondata/pull/368 The commit log shows Changes allwefantasy and others added some commits 5 days ago. I guess there is no problem @ravipesala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on the issue:
https://github.com/apache/incubator-carbondata/pull/368 Yes It seems not ok...... I will try to figure out how to resolve this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/incubator-carbondata/pull/368 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user allwefantasy closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/368 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
GitHub user allwefantasy reopened a pull request:
https://github.com/apache/incubator-carbondata/pull/368 [CARBONDATA-465] Spark streaming dataframe support * mvn clean verify have already been passed locally. * No new unit test cases are added * Tested in streamingpro project. * Remove kettle clearly * Fix NLP of loading data with new flow when sometimes spark executor will use TaskContext.get but the code run in another thread You can merge this pull request into a Git repository by running: $ git pull https://github.com/allwefantasy/incubator-carbondata spark-streaming-dataframe-support2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/368.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #368 ---- commit 44dd84ff48f023ec60a1e49ae5a6e70df387738b Author: WilliamZhu <[hidden email]> Date: 2016-12-06T04:39:22Z merge from master commit 9cf2442e2a7579547fd6a9147721c66499bf13d5 Author: WilliamZhu <[hidden email]> Date: 2016-11-29T15:46:42Z spark streaming dataframe support commit 92f41ca41fc449528df9716dceec6f3c54e9535f Author: WilliamZhu <[hidden email]> Date: 2016-11-29T16:12:24Z reset maven-source-plugin commit b64c4a6fe87e638e935be1360e3b55956b01bee8 Author: WilliamZhu <[hidden email]> Date: 2016-11-29T16:12:24Z reset maven-source-plugin commit 85adeb6b79460e873f4340dbf1c80e262e235bbd Author: WilliamZhu <[hidden email]> Date: 2016-11-30T13:24:33Z remove comments on createTableFromThrift and rais jira later ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on the issue:
https://github.com/apache/incubator-carbondata/pull/368 Fix commit log issue and update to latest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/368 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/incubator-carbondata/pull/368 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/368 Build Success, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/35/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |