[jira] [Updated] (CARBONDATA-322) Integration with spark 2.x

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-322) Integration with spark 2.x

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jihong MA updated CARBONDATA-322:
---------------------------------
    Description:
Since spark 2.0 released. there are many nice features such as more efficient parser, vectorized execution, adaptive execution.
It is good to integrate with spark 2.x

current integration up to Spark v1.6 is tightly coupled with spark, we would like to cleanup the interface with following design points in mind:

1. decoupled with Spark, integration based on Spark's v2 datasource API
2. Enable vectorized carbon reader
3. Support saving DataFrame to Carbondata file through Carbondata's output format.
...


  was:
As spark 2.0 released. there are many nice features such as more efficient parser, vectorized execution, adaptive execution.
It is good to integrate with spark 2.x

Another side now in carbondata, spark integration is heavy coupling with spark code and the code need clean, we should redesign the spark integration, it should satisfy flowing requirement:

1. decoupled with spark, integrate according to spark datasource API(V2)
2. This integration should support vectorized carbon reader
3. Supoort write to carbondata from dadatrame
...


     Issue Type: Improvement  (was: Bug)
        Summary: Integration with  spark 2.x   (was: integrate spark 2.x )

> Integration with  spark 2.x
> ----------------------------
>
>                 Key: CARBONDATA-322
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-322
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: spark-integration
>    Affects Versions: 0.2.0-incubating
>            Reporter: Fei Wang
>            Assignee: Fei Wang
>             Fix For: 1.0.0-incubating
>
>
> Since spark 2.0 released. there are many nice features such as more efficient parser, vectorized execution, adaptive execution.
> It is good to integrate with spark 2.x
> current integration up to Spark v1.6 is tightly coupled with spark, we would like to cleanup the interface with following design points in mind:
> 1. decoupled with Spark, integration based on Spark's v2 datasource API
> 2. Enable vectorized carbon reader
> 3. Support saving DataFrame to Carbondata file through Carbondata's output format.
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)