Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Updated] (CARBONDATA-307) Support executor side scan using CarbonInputFormat

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Updated] (CARBONDATA-307) Support executor side scan using CarbonInputFormat

[ https://issues.apache.org/jira/browse/CARBONDATA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacky Li updated CARBONDATA-307:
--------------------------------
Description:
Currently, there are two read path in carbon-spark module:
1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan.

2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonInputFormat(CarbonRecordReader) => QueryExecutor
In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan

Because of this, there are unnecessary duplicate code, they need to be unified.
The target approach should be:
sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD => CarbonInputFormat(CarbonRecordReader) => QueryExecutor

was:
Currently, there are two read path in carbon-spark module:
1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan.

2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonRecordReader => QueryExecutor
In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan

Because of this, there are unnecessary duplicate code, they need to be unified.
The target approach should be:
sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD => QueryExecutor

> Support executor side scan using CarbonInputFormat
> --------------------------------------------------
>
> Key: CARBONDATA-307
> URL: https://issues.apache.org/jira/browse/CARBONDATA-307
> Project: CarbonData
> Issue Type: Improvement
> Components: spark-integration
> Affects Versions: 0.1.0-incubating
> Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Currently, there are two read path in carbon-spark module:
> 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
> In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan.
> 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonInputFormat(CarbonRecordReader) => QueryExecutor
> In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan
> Because of this, there are unnecessary duplicate code, they need to be unified.
> The target approach should be:
> sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD => CarbonInputFormat(CarbonRecordReader) => QueryExecutor

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)