[
https://issues.apache.org/jira/browse/CARBONDATA-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529848#comment-15529848 ]
ASF GitHub Bot commented on CARBONDATA-279:
-------------------------------------------
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/203#discussion_r80932289
--- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala ---
@@ -742,7 +742,8 @@ object GlobalDictionaryUtil extends Logging {
*/
def generateGlobalDictionary(sqlContext: SQLContext,
carbonLoadModel: CarbonLoadModel,
- hdfsLocation: String): Unit = {
+ hdfsLocation: String,
+ dataFrame: Option[DataFrame] = None): Unit = {
--- End diff --
I think we can call dataFrame.rdd.cache inside this function, because there are 2 scan, dataframe can be re-used after the first scan.
> [DataLoading]Save a DataFrame to CarbonData file without writing CSV file
> -------------------------------------------------------------------------
>
> Key: CARBONDATA-279
> URL:
https://issues.apache.org/jira/browse/CARBONDATA-279> Project: CarbonData
> Issue Type: Improvement
> Affects Versions: 0.1.0-incubating
> Reporter: QiangCai
> Assignee: QiangCai
> Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> Directly save a DataFrame to CarbonData file without writing CSV file
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)