[
https://issues.apache.org/jira/browse/CARBONDATA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacky Li updated CARBONDATA-307:
--------------------------------
Description:
Currently, there are two read path in carbon-spark module:
1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan.
2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonRecordReader => QueryExecutor
In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan
Because of this, there are unnecessary duplicate code, they need to be unified.
The target approach should be:
sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD => QueryExecutor
was:
Currently, there are two read path in carbon-spark module:
1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan.
2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonRecordReader => QueryExecutor
In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan
Because of this, there are unnecessary duplicate code, they need to be unified.
> Support executor side scan using CarbonInputFormat
> --------------------------------------------------
>
> Key: CARBONDATA-307
> URL:
https://issues.apache.org/jira/browse/CARBONDATA-307> Project: CarbonData
> Issue Type: Improvement
> Components: spark-integration
> Affects Versions: 0.1.0-incubating
> Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Currently, there are two read path in carbon-spark module:
> 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
> In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan.
> 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonRecordReader => QueryExecutor
> In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan
> Because of this, there are unnecessary duplicate code, they need to be unified.
> The target approach should be:
> sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD => QueryExecutor
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)