[jira] [Updated] (CARBONDATA-307) Support executor side scan using CarbonInputFormat

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-307) Support executor side scan using CarbonInputFormat

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacky Li updated CARBONDATA-307:
--------------------------------
    Description:
Currently, there are two read path in carbon-spark module:
1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan.

2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonInputFormat(CarbonRecordReader) => QueryExecutor
In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan

Because of this, there are unnecessary duplicate code, they need to be unified.
The target approach should be:
sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD =>  CarbonInputFormat(CarbonRecordReader) => QueryExecutor


  was:
Currently, there are two read path in carbon-spark module:
1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan.

2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonRecordReader => QueryExecutor
In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan

Because of this, there are unnecessary duplicate code, they need to be unified.
The target approach should be:
sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD => QueryExecutor



> Support executor side scan using CarbonInputFormat
> --------------------------------------------------
>
>                 Key: CARBONDATA-307
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-307
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: spark-integration
>    Affects Versions: 0.1.0-incubating
>            Reporter: Jacky Li
>             Fix For: 0.2.0-incubating
>
>
> Currently, there are two read path in carbon-spark module:
> 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
> In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan.
> 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonInputFormat(CarbonRecordReader) => QueryExecutor
> In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan
> Because of this, there are unnecessary duplicate code, they need to be unified.
> The target approach should be:
> sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD =>  CarbonInputFormat(CarbonRecordReader) => QueryExecutor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)