[GitHub] carbondata pull request #2824: WIP: Optimize multiple temp dir for loading a...

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: [CARBONDATA-3008] Optimize default value for multipl...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9203/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2824#discussion_r227289015
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala ---
    @@ -172,33 +172,8 @@ with Serializable {
               dataSchema: StructType,
               context: TaskAttemptContext): OutputWriter = {
             val model = CarbonTableOutputFormat.getLoadModel(context.getConfiguration)
    -        val isCarbonUseMultiDir = CarbonProperties.getInstance().isUseMultiTempDir
    -        var storeLocation: Array[String] = Array[String]()
    -        val isCarbonUseLocalDir = CarbonProperties.getInstance()
    -          .getProperty("carbon.use.local.dir", "false").equalsIgnoreCase("true")
    -
    -
             val taskNumber = generateTaskNumber(path, context, model.getSegmentId)
    -        val tmpLocationSuffix =
    -          File.separator + "carbon" + System.nanoTime() + File.separator + taskNumber
    -        if (isCarbonUseLocalDir) {
    -          val yarnStoreLocations = Util.getConfiguredLocalDirs(SparkEnv.get.conf)
    -          if (!isCarbonUseMultiDir && null != yarnStoreLocations && yarnStoreLocations.nonEmpty) {
    -            // use single dir
    -            storeLocation = storeLocation :+
    -              (yarnStoreLocations(Random.nextInt(yarnStoreLocations.length)) + tmpLocationSuffix)
    -            if (storeLocation == null || storeLocation.isEmpty) {
    -              storeLocation = storeLocation :+
    -                (System.getProperty("java.io.tmpdir") + tmpLocationSuffix)
    -            }
    -          } else {
    -            // use all the yarn dirs
    -            storeLocation = yarnStoreLocations.map(_ + tmpLocationSuffix)
    -          }
    -        } else {
    -          storeLocation =
    -            storeLocation :+ (System.getProperty("java.io.tmpdir") + tmpLocationSuffix)
    -        }
    +        val storeLocation = CommonUtil.getTempStoreLocations(taskNumber)
    --- End diff --
   
    the old tmpLocationSuffix  is different from the method CommonUtil.getTempStoreLocations.
    please check whether is ok or not


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: [CARBONDATA-3008] Optimize default value for multipl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1171/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: [CARBONDATA-3008] Optimize default value for multipl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/964/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: [CARBONDATA-3008] Optimize default value for multipl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9228/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2824#discussion_r227352023
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala ---
    @@ -172,33 +172,8 @@ with Serializable {
               dataSchema: StructType,
               context: TaskAttemptContext): OutputWriter = {
             val model = CarbonTableOutputFormat.getLoadModel(context.getConfiguration)
    -        val isCarbonUseMultiDir = CarbonProperties.getInstance().isUseMultiTempDir
    -        var storeLocation: Array[String] = Array[String]()
    -        val isCarbonUseLocalDir = CarbonProperties.getInstance()
    -          .getProperty("carbon.use.local.dir", "false").equalsIgnoreCase("true")
    -
    -
             val taskNumber = generateTaskNumber(path, context, model.getSegmentId)
    -        val tmpLocationSuffix =
    -          File.separator + "carbon" + System.nanoTime() + File.separator + taskNumber
    -        if (isCarbonUseLocalDir) {
    -          val yarnStoreLocations = Util.getConfiguredLocalDirs(SparkEnv.get.conf)
    -          if (!isCarbonUseMultiDir && null != yarnStoreLocations && yarnStoreLocations.nonEmpty) {
    -            // use single dir
    -            storeLocation = storeLocation :+
    -              (yarnStoreLocations(Random.nextInt(yarnStoreLocations.length)) + tmpLocationSuffix)
    -            if (storeLocation == null || storeLocation.isEmpty) {
    -              storeLocation = storeLocation :+
    -                (System.getProperty("java.io.tmpdir") + tmpLocationSuffix)
    -            }
    -          } else {
    -            // use all the yarn dirs
    -            storeLocation = yarnStoreLocations.map(_ + tmpLocationSuffix)
    -          }
    -        } else {
    -          storeLocation =
    -            storeLocation :+ (System.getProperty("java.io.tmpdir") + tmpLocationSuffix)
    -        }
    +        val storeLocation = CommonUtil.getTempStoreLocations(taskNumber)
    --- End diff --
   
    yes, I also noticed this.
    The suffix has no meanings, just used to separate each thread's output.
    I think it's a mistake by hand -- that's why we need to extract these code to avoid problems like this.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2824#discussion_r227625617
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala ---
    @@ -172,33 +172,8 @@ with Serializable {
               dataSchema: StructType,
               context: TaskAttemptContext): OutputWriter = {
             val model = CarbonTableOutputFormat.getLoadModel(context.getConfiguration)
    -        val isCarbonUseMultiDir = CarbonProperties.getInstance().isUseMultiTempDir
    -        var storeLocation: Array[String] = Array[String]()
    -        val isCarbonUseLocalDir = CarbonProperties.getInstance()
    -          .getProperty("carbon.use.local.dir", "false").equalsIgnoreCase("true")
    -
    -
             val taskNumber = generateTaskNumber(path, context, model.getSegmentId)
    -        val tmpLocationSuffix =
    -          File.separator + "carbon" + System.nanoTime() + File.separator + taskNumber
    -        if (isCarbonUseLocalDir) {
    -          val yarnStoreLocations = Util.getConfiguredLocalDirs(SparkEnv.get.conf)
    -          if (!isCarbonUseMultiDir && null != yarnStoreLocations && yarnStoreLocations.nonEmpty) {
    -            // use single dir
    -            storeLocation = storeLocation :+
    -              (yarnStoreLocations(Random.nextInt(yarnStoreLocations.length)) + tmpLocationSuffix)
    -            if (storeLocation == null || storeLocation.isEmpty) {
    -              storeLocation = storeLocation :+
    -                (System.getProperty("java.io.tmpdir") + tmpLocationSuffix)
    -            }
    -          } else {
    -            // use all the yarn dirs
    -            storeLocation = yarnStoreLocations.map(_ + tmpLocationSuffix)
    -          }
    -        } else {
    -          storeLocation =
    -            storeLocation :+ (System.getProperty("java.io.tmpdir") + tmpLocationSuffix)
    -        }
    +        val storeLocation = CommonUtil.getTempStoreLocations(taskNumber)
    --- End diff --
   
    @jackylk @QiangCai
    I've debugged and reviewed the code again and found it works as expected: all the emp locations were cleared.
   
    The `TempStoreLocations` generated at the begining of data loading is just the same as that at the closure of `CarbonTableOutputFormat` in which these locations will be cleared.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: [CARBONDATA-3008] Optimize default value for multipl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2824


---
12