[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

classic Classic list List threaded Threaded
59 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
Github user sraghunandan commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129195193
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ---
    @@ -892,6 +892,23 @@ public int getNoDeleteDeltaFilesThresholdForIUDCompaction() {
       }
     
       /**
    +   * Returns whether to use multi temp dirs
    +   * @return boolean
    +   */
    +  public boolean isUseMultiTempDir() {
    +    String usingMultiDirStr = getProperty(CarbonCommonConstants.CARBON_USING_MULTI_TEMP_DIR,
    +        CarbonCommonConstants.CARBON_USING_MULTI_TEMP_DIR_DEFAULT);
    +    boolean validateBoolean = CarbonUtil.validateBoolean(usingMultiDirStr);
    +    if (!validateBoolean) {
    +      LOGGER.info("The using multi temp dir value \"" + usingMultiDirStr
    --- End diff --
   
    carbon.use.multiple.temp.dir configuration value is invalid.Configured value:usingMultiDirStr. Data Load will not use multiple temp directories


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sraghunandan commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129197372
 
    --- Diff: docs/useful-tips-on-carbondata.md ---
    @@ -231,5 +231,6 @@ scenarios. After the completion of POC, some of the configurations impacting the
     | spark.executor.instances/spark.executor.cores/spark.executor.memory | spark/conf/spark-defaults.conf | Querying | The number of executors, CPU cores, and memory used for CarbonData query. | In the bank scenario, we provide the 4 CPUs cores and 15 GB for each executor which can get good performance. This 2 value does not mean more the better. It needs to be configured properly in case of limited resources. For example, In the bank scenario, it has enough CPU 32 cores each node but less memory 64 GB each node. So we cannot give more CPU but less memory. For example, when 4 cores and 12GB for each executor. It sometimes happens GC during the query which impact the query performance very much from the 3 second to more than 15 seconds. In this scenario need to increase the memory or decrease the CPU cores. |
     | carbon.detail.batch.size | spark/carbonlib/carbon.properties | Data loading | The buffer size to store records, returned from the block scan. | In limit scenario this parameter is very important. For example your query limit is 1000. But if we set this value to 3000 that means we get 3000 records from scan but spark will only take 1000 rows. So the 2000 remaining are useless. In one Finance test case after we set it to 100, in the limit 1000 scenario the performance increase about 2 times in comparison to if we set this value to 12000. |
     | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | Whether use YARN local directories for multi-table load disk load balance | If this is set it to true CarbonData will use YARN local directories for multi-table load disk load balance, that will improve the data load performance. |
    +| carbon.use.multiple.temp.dir | spark/carbonlib/carbon.properties | Data loading | Whether to use multiple YARN local directories during table data loading for disk load balance | After enabling 'carbon.use.local.dir', if this is set to true, CarbonData will use YARN local directories during data load for disk load balance, that will improve the data load performance. Please enable this property especially when you encounter disk hotspot problem during data loading. |
    --- End diff --
   
    will use all yarn local directories


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sraghunandan commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129219109
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1310,6 +1310,18 @@
       public static final String CARBON_LEASE_RECOVERY_RETRY_INTERVAL =
           "carbon.lease.recovery.retry.interval";
     
    +  /**
    +   * whether to use multi directories when loading data,
    +   * the main purpose is to avoid single-disk-hot-spot
    +   */
    +  @CarbonProperty
    +  public static final String CARBON_USING_MULTI_TEMP_DIR = "carbon.use.multiple.temp.dir";
    --- End diff --
   
    change to CARBON_USE_MULTI_TEMP_DIR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sraghunandan commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129198419
 
    --- Diff: docs/useful-tips-on-carbondata.md ---
    @@ -231,5 +231,6 @@ scenarios. After the completion of POC, some of the configurations impacting the
     | spark.executor.instances/spark.executor.cores/spark.executor.memory | spark/conf/spark-defaults.conf | Querying | The number of executors, CPU cores, and memory used for CarbonData query. | In the bank scenario, we provide the 4 CPUs cores and 15 GB for each executor which can get good performance. This 2 value does not mean more the better. It needs to be configured properly in case of limited resources. For example, In the bank scenario, it has enough CPU 32 cores each node but less memory 64 GB each node. So we cannot give more CPU but less memory. For example, when 4 cores and 12GB for each executor. It sometimes happens GC during the query which impact the query performance very much from the 3 second to more than 15 seconds. In this scenario need to increase the memory or decrease the CPU cores. |
     | carbon.detail.batch.size | spark/carbonlib/carbon.properties | Data loading | The buffer size to store records, returned from the block scan. | In limit scenario this parameter is very important. For example your query limit is 1000. But if we set this value to 3000 that means we get 3000 records from scan but spark will only take 1000 rows. So the 2000 remaining are useless. In one Finance test case after we set it to 100, in the limit 1000 scenario the performance increase about 2 times in comparison to if we set this value to 12000. |
     | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | Whether use YARN local directories for multi-table load disk load balance | If this is set it to true CarbonData will use YARN local directories for multi-table load disk load balance, that will improve the data load performance. |
    +| carbon.use.multiple.temp.dir | spark/carbonlib/carbon.properties | Data loading | Whether to use multiple YARN local directories during table data loading for disk load balance | After enabling 'carbon.use.local.dir', if this is set to true, CarbonData will use YARN local directories during data load for disk load balance, that will improve the data load performance. Please enable this property especially when you encounter disk hotspot problem during data loading. |
    --- End diff --
   
    can remove the word especially


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sraghunandan commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129219832
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/merger/RowResultMergerProcessor.java ---
    @@ -57,12 +57,15 @@
           LogServiceFactory.getLogService(RowResultMergerProcessor.class.getName());
     
       public RowResultMergerProcessor(String databaseName,
    -      String tableName, SegmentProperties segProp, String tempStoreLocation,
    +      String tableName, SegmentProperties segProp, String[] tempStoreLocation,
           CarbonLoadModel loadModel, CompactionType compactionType) {
         this.segprop = segProp;
    -    if (!new File(tempStoreLocation).mkdirs()) {
    -      LOGGER.error("Error while new File(tempStoreLocation).mkdirs() ");
    +    for (String temLoc : tempStoreLocation) {
    +      if (!new File(temLoc).mkdirs()) {
    +        LOGGER.error("Error while new File(tempStoreLocation).mkdirs() ");
    --- End diff --
   
    "Error while creating new directory:" + temLoc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129279394
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/unsafe/UnsafeSortDataRows.java ---
    @@ -306,7 +309,9 @@ private void writeData(UnsafeCarbonRowPage rowPage, File file)
        * This method will be used to delete sort temp location is it is exites
        */
       public void deleteSortLocationIfExists() {
    -    CarbonDataProcessorUtil.deleteSortLocationIfExists(parameters.getTempFileLocation());
    +    for (String loc : parameters.getTempFileLocation()) {
    --- End diff --
   
    :+1:  fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129279731
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/merger/RowResultMergerProcessor.java ---
    @@ -57,12 +57,15 @@
           LogServiceFactory.getLogService(RowResultMergerProcessor.class.getName());
     
       public RowResultMergerProcessor(String databaseName,
    -      String tableName, SegmentProperties segProp, String tempStoreLocation,
    +      String tableName, SegmentProperties segProp, String[] tempStoreLocation,
           CarbonLoadModel loadModel, CompactionType compactionType) {
         this.segprop = segProp;
    -    if (!new File(tempStoreLocation).mkdirs()) {
    -      LOGGER.error("Error while new File(tempStoreLocation).mkdirs() ");
    +    for (String temLoc : tempStoreLocation) {
    +      if (!new File(temLoc).mkdirs()) {
    +        LOGGER.error("Error while new File(tempStoreLocation).mkdirs() ");
    --- End diff --
   
    :+1:  fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129280458
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ---
    @@ -892,6 +892,23 @@ public int getNoDeleteDeltaFilesThresholdForIUDCompaction() {
       }
     
       /**
    +   * Returns whether to use multi temp dirs
    +   * @return boolean
    +   */
    +  public boolean isUseMultiTempDir() {
    +    String usingMultiDirStr = getProperty(CarbonCommonConstants.CARBON_USING_MULTI_TEMP_DIR,
    +        CarbonCommonConstants.CARBON_USING_MULTI_TEMP_DIR_DEFAULT);
    +    boolean validateBoolean = CarbonUtil.validateBoolean(usingMultiDirStr);
    +    if (!validateBoolean) {
    +      LOGGER.info("The using multi temp dir value \"" + usingMultiDirStr
    --- End diff --
   
    :+1: fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129280640
 
    --- Diff: docs/useful-tips-on-carbondata.md ---
    @@ -231,5 +231,6 @@ scenarios. After the completion of POC, some of the configurations impacting the
     | spark.executor.instances/spark.executor.cores/spark.executor.memory | spark/conf/spark-defaults.conf | Querying | The number of executors, CPU cores, and memory used for CarbonData query. | In the bank scenario, we provide the 4 CPUs cores and 15 GB for each executor which can get good performance. This 2 value does not mean more the better. It needs to be configured properly in case of limited resources. For example, In the bank scenario, it has enough CPU 32 cores each node but less memory 64 GB each node. So we cannot give more CPU but less memory. For example, when 4 cores and 12GB for each executor. It sometimes happens GC during the query which impact the query performance very much from the 3 second to more than 15 seconds. In this scenario need to increase the memory or decrease the CPU cores. |
     | carbon.detail.batch.size | spark/carbonlib/carbon.properties | Data loading | The buffer size to store records, returned from the block scan. | In limit scenario this parameter is very important. For example your query limit is 1000. But if we set this value to 3000 that means we get 3000 records from scan but spark will only take 1000 rows. So the 2000 remaining are useless. In one Finance test case after we set it to 100, in the limit 1000 scenario the performance increase about 2 times in comparison to if we set this value to 12000. |
     | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | Whether use YARN local directories for multi-table load disk load balance | If this is set it to true CarbonData will use YARN local directories for multi-table load disk load balance, that will improve the data load performance. |
    +| carbon.use.multiple.temp.dir | spark/carbonlib/carbon.properties | Data loading | Whether to use multiple YARN local directories during table data loading for disk load balance | After enabling 'carbon.use.local.dir', if this is set to true, CarbonData will use YARN local directories during data load for disk load balance, that will improve the data load performance. Please enable this property especially when you encounter disk hotspot problem during data loading. |
    --- End diff --
   
    :+1: fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129280724
 
    --- Diff: docs/useful-tips-on-carbondata.md ---
    @@ -231,5 +231,6 @@ scenarios. After the completion of POC, some of the configurations impacting the
     | spark.executor.instances/spark.executor.cores/spark.executor.memory | spark/conf/spark-defaults.conf | Querying | The number of executors, CPU cores, and memory used for CarbonData query. | In the bank scenario, we provide the 4 CPUs cores and 15 GB for each executor which can get good performance. This 2 value does not mean more the better. It needs to be configured properly in case of limited resources. For example, In the bank scenario, it has enough CPU 32 cores each node but less memory 64 GB each node. So we cannot give more CPU but less memory. For example, when 4 cores and 12GB for each executor. It sometimes happens GC during the query which impact the query performance very much from the 3 second to more than 15 seconds. In this scenario need to increase the memory or decrease the CPU cores. |
     | carbon.detail.batch.size | spark/carbonlib/carbon.properties | Data loading | The buffer size to store records, returned from the block scan. | In limit scenario this parameter is very important. For example your query limit is 1000. But if we set this value to 3000 that means we get 3000 records from scan but spark will only take 1000 rows. So the 2000 remaining are useless. In one Finance test case after we set it to 100, in the limit 1000 scenario the performance increase about 2 times in comparison to if we set this value to 12000. |
     | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | Whether use YARN local directories for multi-table load disk load balance | If this is set it to true CarbonData will use YARN local directories for multi-table load disk load balance, that will improve the data load performance. |
    +| carbon.use.multiple.temp.dir | spark/carbonlib/carbon.properties | Data loading | Whether to use multiple YARN local directories during table data loading for disk load balance | After enabling 'carbon.use.local.dir', if this is set to true, CarbonData will use YARN local directories during data load for disk load balance, that will improve the data load performance. Please enable this property especially when you encounter disk hotspot problem during data loading. |
    --- End diff --
   
    :+1: fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129280832
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/merger/RowResultMergerProcessor.java ---
    @@ -57,12 +57,15 @@
           LogServiceFactory.getLogService(RowResultMergerProcessor.class.getName());
     
       public RowResultMergerProcessor(String databaseName,
    -      String tableName, SegmentProperties segProp, String tempStoreLocation,
    +      String tableName, SegmentProperties segProp, String[] tempStoreLocation,
           CarbonLoadModel loadModel, CompactionType compactionType) {
         this.segprop = segProp;
    -    if (!new File(tempStoreLocation).mkdirs()) {
    -      LOGGER.error("Error while new File(tempStoreLocation).mkdirs() ");
    +    for (String temLoc : tempStoreLocation) {
    +      if (!new File(temLoc).mkdirs()) {
    +        LOGGER.error("Error while new File(tempStoreLocation).mkdirs() ");
    --- End diff --
   
    :+1: fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129281496
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/steps/CarbonRowDataWriterProcessorStepImpl.java ---
    @@ -112,9 +115,11 @@ private String getStoreLocation(CarbonTableIdentifier tableIdentifier, String pa
           isNoDictionaryDimensionColumn =
               CarbonDataProcessorUtil.getNoDictionaryMapping(configuration.getDataFields());
           measureDataType = configuration.getMeasureDataType();
    +      //choose a tmp location randomly
    --- End diff --
   
    just a temporary variable, I'll make the function call in the argument list


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129281674
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1310,6 +1310,18 @@
       public static final String CARBON_LEASE_RECOVERY_RETRY_INTERVAL =
           "carbon.lease.recovery.retry.interval";
     
    +  /**
    +   * whether to use multi directories when loading data,
    +   * the main purpose is to avoid single-disk-hot-spot
    +   */
    +  @CarbonProperty
    +  public static final String CARBON_USING_MULTI_TEMP_DIR = "carbon.use.multiple.temp.dir";
    --- End diff --
   
    :+1: fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129281827
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/merger/RowResultMergerProcessor.java ---
    @@ -57,12 +57,15 @@
           LogServiceFactory.getLogService(RowResultMergerProcessor.class.getName());
     
       public RowResultMergerProcessor(String databaseName,
    -      String tableName, SegmentProperties segProp, String tempStoreLocation,
    +      String tableName, SegmentProperties segProp, String[] tempStoreLocation,
           CarbonLoadModel loadModel, CompactionType compactionType) {
         this.segprop = segProp;
    -    if (!new File(tempStoreLocation).mkdirs()) {
    -      LOGGER.error("Error while new File(tempStoreLocation).mkdirs() ");
    +    for (String temLoc : tempStoreLocation) {
    +      if (!new File(temLoc).mkdirs()) {
    +        LOGGER.error("Error while new File(tempStoreLocation).mkdirs() ");
    --- End diff --
   
    :+1: fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129286013
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/steps/DataWriterBatchProcessorStepImpl.java ---
    @@ -58,12 +58,14 @@ public DataWriterBatchProcessorStepImpl(CarbonDataLoadConfiguration configuratio
         child.initialize();
       }
     
    -  private String getStoreLocation(CarbonTableIdentifier tableIdentifier, String partitionId) {
    -    String storeLocation = CarbonDataProcessorUtil
    +  private String[] getStoreLocation(CarbonTableIdentifier tableIdentifier, String partitionId) {
    +    String[] storeLocation = CarbonDataProcessorUtil
             .getLocalDataFolderLocation(tableIdentifier.getDatabaseName(),
                 tableIdentifier.getTableName(), String.valueOf(configuration.getTaskNo()), partitionId,
                 configuration.getSegmentId() + "", false);
    -    new File(storeLocation).mkdirs();
    +    for (String loc : storeLocation) {
    --- End diff --
   
    :+1: all related reference fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129289336
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/impl/UnsafeParallelReadMergeSorterWithBucketingImpl.java ---
    @@ -168,11 +171,15 @@ private boolean processRowToNextStep(UnsafeSortDataRows[] sortDataRows, SortPara
       }
     
       private void setTempLocation(SortParameters parameters) {
    -    String carbonDataDirectoryPath = CarbonDataProcessorUtil
    +    String[] carbonDataDirectoryPath = CarbonDataProcessorUtil
             .getLocalDataFolderLocation(parameters.getDatabaseName(), parameters.getTableName(),
                 parameters.getTaskNo(), parameters.getPartitionID(), parameters.getSegmentId(), false);
    -    parameters.setTempFileLocation(
    -        carbonDataDirectoryPath + File.separator + CarbonCommonConstants.SORT_TEMP_FILE_LOCATION);
    +    String[] tmpLoc = new String[carbonDataDirectoryPath.length];
    --- End diff --
   
    :+1: all fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129289344
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/impl/ParallelReadMergeSorterWithBucketingImpl.java ---
    @@ -185,12 +188,17 @@ private boolean processRowToNextStep(SortDataRows[] sortDataRows, SortParameters
       }
     
       private void setTempLocation(SortParameters parameters) {
    -    String carbonDataDirectoryPath = CarbonDataProcessorUtil
    +    String[] carbonDataDirectoryPath = CarbonDataProcessorUtil
             .getLocalDataFolderLocation(parameters.getDatabaseName(),
                 parameters.getTableName(), parameters.getTaskNo(),
                 parameters.getPartitionID(), parameters.getSegmentId(), false);
    -    parameters.setTempFileLocation(
    -        carbonDataDirectoryPath + File.separator + CarbonCommonConstants.SORT_TEMP_FILE_LOCATION);
    +    String[] tmpLocs = new String[carbonDataDirectoryPath.length];
    --- End diff --
   
    :+1: all fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin closed the pull request at:

    https://github.com/apache/carbondata/pull/1177


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1177: [CARBONDATA-1281] Support multiple temp dirs for wri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1177
 
    Thanks for all the reviewers. I create a new PR to reduce the number of commits, see #1195


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
123