[ https://issues.apache.org/jira/browse/CARBONDATA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-307: -------------------------------- Description: Currently, there are two read path in carbon-spark module: 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan. 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonInputFormat(CarbonRecordReader) => QueryExecutor In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan Because of this, there are unnecessary duplicate code, they need to be unified. The target approach should be: sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD => CarbonInputFormat(CarbonRecordReader) => QueryExecutor was: Currently, there are two read path in carbon-spark module: 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan. 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonRecordReader => QueryExecutor In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan Because of this, there are unnecessary duplicate code, they need to be unified. The target approach should be: sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD => QueryExecutor > Support executor side scan using CarbonInputFormat > -------------------------------------------------- > > Key: CARBONDATA-307 > URL: https://issues.apache.org/jira/browse/CARBONDATA-307 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration > Affects Versions: 0.1.0-incubating > Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > Currently, there are two read path in carbon-spark module: > 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor > In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan. > 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonInputFormat(CarbonRecordReader) => QueryExecutor > In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan > Because of this, there are unnecessary duplicate code, they need to be unified. > The target approach should be: > sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD => CarbonInputFormat(CarbonRecordReader) => QueryExecutor -- This message was sent by Atlassian JIRA (v6.3.4#6332) |
Free forum by Nabble | Edit this page |