[jira] [Commented] (CARBONDATA-279) [DataLoading]Save a DataFrame to CarbonData file without writing CSV file

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (CARBONDATA-279) [DataLoading]Save a DataFrame to CarbonData file without writing CSV file

Akash R Nilugal (Jira)

    [ https://issues.apache.org/jira/browse/CARBONDATA-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529818#comment-15529818 ]

ASF GitHub Bot commented on CARBONDATA-279:
-------------------------------------------

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/203#discussion_r80915025
 
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataLoadRDD.scala ---
    @@ -145,92 +232,28 @@ class CarbonDataLoadRDD[K, V](
       override def compute(theSplit: Partition, context: TaskContext): Iterator[(K, V)] = {
         val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
         val iter = new Iterator[(K, V)] {
    -      var dataloadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE
           var partitionID = "0"
    +      val loadMetadataDetails = new LoadMetadataDetails()
           var model: CarbonLoadModel = _
           var uniqueLoadStatusId = carbonLoadModel.getTableName + CarbonCommonConstants.UNDERSCORE +
                                    theSplit.index
           try {
    -        val carbonPropertiesFilePath = System.getProperty("carbon.properties.filepath", null)
    -        if (null == carbonPropertiesFilePath) {
    -          System.setProperty("carbon.properties.filepath",
    -            System.getProperty("user.dir") + '/' + "conf" + '/' + "carbon.properties")
    -        }
    +        loadMetadataDetails.setPartitionCount(partitionID)
    +        loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_FAILURE)
    +
             carbonLoadModel.setSegmentId(String.valueOf(loadCount))
             setModelAndBlocksInfo()
    -        CarbonTimeStatisticsFactory.getLoadStatisticsInstance.initPartitonInfo(model.getPartitionId)
    -        CarbonProperties.getInstance().addProperty("carbon.is.columnar.storage", "true")
    -        CarbonProperties.getInstance().addProperty("carbon.dimension.split.value.in.columnar", "1")
    -        CarbonProperties.getInstance().addProperty("carbon.is.fullyfilled.bits", "true")
    -        CarbonProperties.getInstance().addProperty("is.int.based.indexer", "true")
    -        CarbonProperties.getInstance().addProperty("aggregate.columnar.keyblock", "true")
    -        CarbonProperties.getInstance().addProperty("high.cardinality.value", "100000")
    -        CarbonProperties.getInstance().addProperty("is.compressed.keyblock", "false")
    -        CarbonProperties.getInstance().addProperty("carbon.leaf.node.size", "120000")
    -
    -        // this property is used to determine whether temp location for carbon is inside
    -        // container temp dir or is yarn application directory.
    -        val carbonUseLocalDir = CarbonProperties.getInstance()
    -          .getProperty("carbon.use.local.dir", "false")
    -
    -        if(carbonUseLocalDir.equalsIgnoreCase("true")) {
    -          val storeLocations = CarbonLoaderUtil.getConfiguredLocalDirs(SparkEnv.get.conf)
    -          if (null != storeLocations && storeLocations.nonEmpty) {
    -            storeLocation = storeLocations(Random.nextInt(storeLocations.length))
    -          }
    -          if (storeLocation == null) {
    -            storeLocation = System.getProperty("java.io.tmpdir")
    -          }
    -        }
    -        else {
    -          storeLocation = System.getProperty("java.io.tmpdir")
    -        }
    -        storeLocation = storeLocation + '/' + System.nanoTime() + '/' + theSplit.index
    -        dataloadStatus = CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS
    -
    +        storeLocation = CarbonDataLoadRDD.initialize(model, theSplit.index)
    +        loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS)
             if (model.isRetentionRequest) {
               recreateAggregationTableForRetention
             }
             else if (model.isAggLoadRequest) {
    --- End diff --
   
    Is this required? If it is not required, remove it.


> [DataLoading]Save a DataFrame to CarbonData file without writing CSV file
> -------------------------------------------------------------------------
>
>                 Key: CARBONDATA-279
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-279
>             Project: CarbonData
>          Issue Type: Improvement
>    Affects Versions: 0.1.0-incubating
>            Reporter: QiangCai
>            Assignee: QiangCai
>            Priority: Minor
>             Fix For: 0.2.0-incubating
>
>
> Directly save a DataFrame to CarbonData file without writing CSV file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)