[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

qiuchenjian-2
GitHub user allwefantasy opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/368

    [CARBONDATA-465] Spark streaming dataframe support

    * mvn clean verify have already been passed locally.
    * No new unit test cases are added
    * Tested in streamingpro project.
    * Remove kettle clearly
    * Fix NLP of loading data with new flow when sometimes spark executor will use  TaskContext.get but the code run in another thread


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/allwefantasy/incubator-carbondata spark-streaming-dataframe-support2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/368.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #368
   
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/368#discussion_r90194407
 
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
    @@ -763,10 +763,9 @@ case class LoadTable(
     
     
           val columinar = sqlContext.getConf("carbon.is.columnar.storage", "true").toBoolean
    -      val kettleHomePath = CarbonScalaUtil.getKettleHome(sqlContext)
     
           // TODO It will be removed after kettle is removed.
    -      val useKettle = options.get("use_kettle") match {
    +      val useKettle = options.get("useKettle") match {
    --- End diff --
   
    why changed the string? better avoid changing it, it is used in other places as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/368#discussion_r90195041
 
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/hive/CarbonMetastoreCatalog.scala ---
    @@ -307,10 +307,17 @@ class CarbonMetastoreCatalog(hiveContext: HiveContext, val storePath: String,
         if (!FileFactory.isFileExist(schemaMetadataPath, fileType)) {
           FileFactory.mkdirs(schemaMetadataPath, fileType)
         }
    +
    +    /**
    +    * schemaFilePath starts with file:// will not create meta files successfully
    +    * while thriftWriter will have no complains.
    +    * This will cause some weired error eg. No table found.
    +    */
    --- End diff --
   
    Better don't keep comment like this. Please raise jira and provide steps to reproduce it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/368#discussion_r90197023
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
    @@ -604,4 +606,33 @@ public static boolean isHeaderValid(String tableName, String header,
         }
         return dateformatsHashMap;
       }
    +
    --- End diff --
   
    This reflection code is not required. Please check how load happens through dataframe in CarbonDataLoadRDD. I guess you can make use of same code.
    And your iterator supposed to extend `InputIterator` of PR33.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/368#discussion_r90228003
 
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
    @@ -763,10 +763,9 @@ case class LoadTable(
     
     
           val columinar = sqlContext.getConf("carbon.is.columnar.storage", "true").toBoolean
    -      val kettleHomePath = CarbonScalaUtil.getKettleHome(sqlContext)
     
           // TODO It will be removed after kettle is removed.
    -      val useKettle = options.get("use_kettle") match {
    +      val useKettle = options.get("useKettle") match {
    --- End diff --
   
    DataFrameWriter accept parameter `useKettle` eg.  writer.option("useKettle", "false").option("tempCSV","false").  Maybe we should keep consistent。


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/368#discussion_r90228110
 
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/hive/CarbonMetastoreCatalog.scala ---
    @@ -307,10 +307,17 @@ class CarbonMetastoreCatalog(hiveContext: HiveContext, val storePath: String,
         if (!FileFactory.isFileExist(schemaMetadataPath, fileType)) {
           FileFactory.mkdirs(schemaMetadataPath, fileType)
         }
    +
    +    /**
    +    * schemaFilePath starts with file:// will not create meta files successfully
    +    * while thriftWriter will have no complains.
    +    * This will cause some weired error eg. No table found.
    +    */
    --- End diff --
   
    Ok .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/368#discussion_r90229415
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
    @@ -604,4 +606,33 @@ public static boolean isHeaderValid(String tableName, String header,
         }
         return dateformatsHashMap;
       }
    +
    --- End diff --
   
    Module carbon-processing do not depends on spark or other computing engine however there are some class need multi-thread to load data which runs as computing engine's task which need get TaskContext using ThreadLocal tech.  
   
    Yes, my first PR is merged from your PR333,but it's not merged to master yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/368#discussion_r90605596
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
    @@ -604,4 +606,33 @@ public static boolean isHeaderValid(String tableName, String header,
         }
         return dateformatsHashMap;
       }
    +
    --- End diff --
   
    PR333 is merged, please rebase


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/368
 
    @jackylk  fix confict. Can you review it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/368
 
    @allwefantasy  it seems there is a problem with merging/rebasing the PR, other commits has come here. Please fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/368
 
    It's weird.  In my local branch, git log shows:
   
    ```
    commit acdf78a8cba4f7c18cbaaf0fcc1a9e9dc3189068
    Merge: 8a21cb7 5ca7218
    Author: WilliamZhu <[hidden email]>
    Date:   Mon Dec 5 11:30:35 2016 +0800
   
        Merge branch 'spark-streaming-dataframe-support2' of github.com:allwefantasy/incubator-carbondata into spark-streaming-dataframe-support2
   
    commit 8a21cb715eac50c04b859530ab459ae9b6f226a3
    Author: WilliamZhu <[hidden email]>
    Date:   Wed Nov 30 21:24:33 2016 +0800
   
        remove comments on createTableFromThrift and rais jira later
   
    commit 06bc4239a2762a6f27da99982b47e880d6a1be4c
    Author: WilliamZhu <[hidden email]>
    Date:   Wed Nov 30 00:12:24 2016 +0800
   
        reset maven-source-plugin
   
    commit 0f042797f54143bd473296bc33650e84d071dd15
    Author: WilliamZhu <[hidden email]>
    Date:   Tue Nov 29 23:46:42 2016 +0800
   
        spark streaming dataframe support
   
    commit 70ae82045e461c740cf2ae80c2058160bc9855a9
    Merge: e7958b6 fc3f6b3
    Author: ravipesala <[hidden email]>
    ```
   
    I will try to figure out


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/368
 
    The commit log shows Changes allwefantasy and others added some commits 5 days ago.  I guess there is no problem @ravipesala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/368
 
    Yes It seems not ok...... I will try to figure out how to resolve this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/368
 
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user allwefantasy closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/368


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
GitHub user allwefantasy reopened a pull request:

    https://github.com/apache/incubator-carbondata/pull/368

    [CARBONDATA-465] Spark streaming dataframe support

    * mvn clean verify have already been passed locally.
    * No new unit test cases are added
    * Tested in streamingpro project.
    * Remove kettle clearly
    * Fix NLP of loading data with new flow when sometimes spark executor will use  TaskContext.get but the code run in another thread


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/allwefantasy/incubator-carbondata spark-streaming-dataframe-support2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/368.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #368
   
----
commit 44dd84ff48f023ec60a1e49ae5a6e70df387738b
Author: WilliamZhu <[hidden email]>
Date:   2016-12-06T04:39:22Z

    merge from master

commit 9cf2442e2a7579547fd6a9147721c66499bf13d5
Author: WilliamZhu <[hidden email]>
Date:   2016-11-29T15:46:42Z

    spark streaming dataframe support

commit 92f41ca41fc449528df9716dceec6f3c54e9535f
Author: WilliamZhu <[hidden email]>
Date:   2016-11-29T16:12:24Z

    reset maven-source-plugin

commit b64c4a6fe87e638e935be1360e3b55956b01bee8
Author: WilliamZhu <[hidden email]>
Date:   2016-11-29T16:12:24Z

    reset maven-source-plugin

commit 85adeb6b79460e873f4340dbf1c80e262e235bbd
Author: WilliamZhu <[hidden email]>
Date:   2016-11-30T13:24:33Z

    remove comments on createTableFromThrift and rais jira later

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user allwefantasy commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/368
 
    Fix commit log issue and update to latest


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/368
 
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/368
 
    add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/368
 
    Build Success, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/35/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
12