[GitHub] carbondata pull request #2824: WIP: Optimize multiple temp dir for loading a...

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2824: WIP: Optimize multiple temp dir for loading a...

qiuchenjian-2
GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/2824

    WIP: Optimize multiple temp dir for loading and compaction

    1. remove duplicated code for multiple temp dir
    2. unset environment variable after loading/compaction
    3. support multiple temp dir for compaction
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata 181016_opt_multiple_dir_4_compaction

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2824.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2824
   
----
commit 249a87b8f39858cc8f7d6526e73fecf64da51a1f
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-10-16T10:24:38Z

    Optimize multiple temp dir for loading and compaction
   
    1. remove duplicated code for multiple temp dir
    2. unset environment variable after loading/compaction
    3. support multiple temp dir for compaction

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: WIP: Optimize multiple temp dir for loading and comp...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/817/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: WIP: Optimize multiple temp dir for loading and comp...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9082/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: WIP: Optimize multiple temp dir for loading and comp...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1014/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: WIP: Optimize multiple temp dir for loading and comp...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/819/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: WIP: Optimize multiple temp dir for loading and comp...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1016/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: WIP: Optimize multiple temp dir for loading and comp...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9084/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: WIP: Optimize multiple temp dir for loading and comp...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/823/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: WIP: Optimize multiple temp dir for loading and comp...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9088/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: WIP: Optimize multiple temp dir for loading and comp...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1020/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: [CARBONDATA-3008] Optimize default value for multipl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/864/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: [CARBONDATA-3008] Optimize default value for multipl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/866/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: [CARBONDATA-3008] Optimize default value for multipl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1064/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2824#discussion_r226325529
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1371,16 +1371,15 @@
       public static final String CARBON_SECURE_DICTIONARY_SERVER_DEFAULT = "true";
     
       /**
    -   * whether to use multi directories when loading data,
    -   * the main purpose is to avoid single-disk-hot-spot
    +   * whether to use yarn's local dir the main purpose is to avoid single disk hot spot
        */
       @CarbonProperty
    -  public static final String CARBON_USE_MULTI_TEMP_DIR = "carbon.use.multiple.temp.dir";
    +  public static final String CARBON_USE_YARN_LOCAL_DIR = "carbon.use.local.dir";
    --- End diff --
   
    ```suggestion
      public static final String CARBON_USE_YARN_LOCAL_DIR = "carbon.use.yarn.local.dir";
    ```


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2824#discussion_r226325939
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1371,16 +1371,15 @@
       public static final String CARBON_SECURE_DICTIONARY_SERVER_DEFAULT = "true";
     
       /**
    -   * whether to use multi directories when loading data,
    -   * the main purpose is to avoid single-disk-hot-spot
    +   * whether to use yarn's local dir the main purpose is to avoid single disk hot spot
        */
       @CarbonProperty
    -  public static final String CARBON_USE_MULTI_TEMP_DIR = "carbon.use.multiple.temp.dir";
    +  public static final String CARBON_USE_YARN_LOCAL_DIR = "carbon.use.local.dir";
    --- End diff --
   
    ```suggestion
      public static final String CARBON_USE_LOCAL_DIR = "carbon.use.local.dir";
    ```


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2824#discussion_r226508384
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1371,16 +1371,15 @@
       public static final String CARBON_SECURE_DICTIONARY_SERVER_DEFAULT = "true";
     
       /**
    -   * whether to use multi directories when loading data,
    -   * the main purpose is to avoid single-disk-hot-spot
    +   * whether to use yarn's local dir the main purpose is to avoid single disk hot spot
        */
       @CarbonProperty
    -  public static final String CARBON_USE_MULTI_TEMP_DIR = "carbon.use.multiple.temp.dir";
    +  public static final String CARBON_USE_YARN_LOCAL_DIR = "carbon.use.local.dir";
    --- End diff --
   
    @jackylk This parameter is not new. If we changed the parameter name, for upgrade scenario, the user has to change the configured parameters as well.
    So I think we should keep the old parameter name here...


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2824#discussion_r227191521
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1371,16 +1371,15 @@
       public static final String CARBON_SECURE_DICTIONARY_SERVER_DEFAULT = "true";
     
       /**
    -   * whether to use multi directories when loading data,
    -   * the main purpose is to avoid single-disk-hot-spot
    +   * whether to use yarn's local dir the main purpose is to avoid single disk hot spot
        */
       @CarbonProperty
    -  public static final String CARBON_USE_MULTI_TEMP_DIR = "carbon.use.multiple.temp.dir";
    +  public static final String CARBON_USE_YARN_LOCAL_DIR = "carbon.use.local.dir";
    --- End diff --
   
    ok. Then I think it is better change the variable name only, we can use the existing variable value


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2824#discussion_r227192328
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1371,16 +1371,15 @@
       public static final String CARBON_SECURE_DICTIONARY_SERVER_DEFAULT = "true";
     
       /**
    -   * whether to use multi directories when loading data,
    -   * the main purpose is to avoid single-disk-hot-spot
    +   * whether to use yarn's local dir the main purpose is to avoid single disk hot spot
        */
       @CarbonProperty
    -  public static final String CARBON_USE_MULTI_TEMP_DIR = "carbon.use.multiple.temp.dir";
    +  public static final String CARBON_USE_YARN_LOCAL_DIR = "carbon.use.local.dir";
    --- End diff --
   
    yeah, I'll fix that


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: [CARBONDATA-3008] Optimize default value for multipl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/939/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2824: [CARBONDATA-3008] Optimize default value for multipl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2824
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1145/



---
12