[GitHub] carbondata pull request #2531: [HOTFIX] Improved BlockDataMap caching perfor...

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2531: [HOTFIX] Improved BlockDataMap caching perfor...

qiuchenjian-2
GitHub user manishgupta88 opened a pull request:

    https://github.com/apache/carbondata/pull/2531

    [HOTFIX] Improved BlockDataMap caching performance during first time query

    Things done as part of this PR
    1. Created taskSumamry and FileFooterEntry schema once and stored in member variable. Everytime creation of schema was a costly operation as time to prune dataMaps increased because of that.
    2. Used TreeMap instead of HashMap while adding the complete file path and data to the map diring merge file read. Using TreeMap improved the map filling performance by 10 sec for 1200 entries.
   
     - [ ] Any interfaces changed?
     No
     - [ ] Any backward compatibility impacted?
     No
     - [ ] Document update required?
    No
     - [ ] Testing done
    Verified manually      
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishgupta88/carbondata query_perf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2531.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2531
   
----
commit 26954b88d606535349f83f80a3e00f9b2db4fd66
Author: manishgupta88 <tomanishgupta18@...>
Date:   2018-07-19T13:45:12Z

    Code modification done to improve query performance

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [HOTFIX] Improved BlockDataMap caching performance d...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6086/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [WIP] [HOTFIX] Improved BlockDataMap caching perform...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7325/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [WIP] [HOTFIX] Improved BlockDataMap caching perform...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [WIP] [HOTFIX] Improved BlockDataMap caching perform...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6092/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [WIP] [HOTFIX] Improved BlockDataMap caching perform...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7334/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [WIP] [HOTFIX] Improved BlockDataMap caching perform...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6099/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [HOTFIX] Improved BlockDataMap caching performance d...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5931/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [HOTFIX] Improved BlockDataMap caching performance d...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6108/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [HOTFIX] Improved BlockDataMap caching performance d...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7354/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2531: [HOTFIX] Improved BlockDataMap caching perfor...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2531#discussion_r203999922
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java ---
    @@ -237,6 +241,32 @@ public void invalidate(String segmentId, int segmentPropertiesIndex,
               .isEmpty()) {
             indexToSegmentPropertiesWrapperMapping.remove(segmentPropertiesIndex);
             segmentPropWrapperToSegmentSetMap.remove(segmentPropertiesWrapper);
    +      } else if (!clearSegmentWrapperFromMap
    +          && segmentIdAndSegmentPropertiesIndexWrapper.segmentIdSet.isEmpty()) {
    +        // min max columns can very when cache is modified. So even though entry is not required
    +        // to be deleted from map clear the column cache so that it can filled again
    +        segmentPropertiesWrapper.clear();
    +        LOGGER.info("cleared min max for segmentProperties at index: " + segmentPropertiesIndex);
    +      }
    +    }
    +  }
    +
    +  /**
    +   * add segmentId at given segmentPropertyIndex
    +   * Note: This method is getting used in extension with other features. Please do not remove
    +   *
    +   * @param segmentPropertiesIndex
    +   * @param segmentId
    +   */
    +  public void addSegmentId(int segmentPropertiesIndex, String segmentId) {
    +    SegmentPropertiesWrapper segmentPropertiesWrapper =
    +        indexToSegmentPropertiesWrapperMapping.get(segmentPropertiesIndex);
    +    if (null != segmentPropertiesWrapper) {
    +      SegmentIdAndSegmentPropertiesIndexWrapper segmentIdAndSegmentPropertiesIndexWrapper =
    +          segmentPropWrapperToSegmentSetMap.get(segmentPropertiesWrapper);
    +      synchronized (segmentPropertiesWrapper.getTableIdentifier().getCarbonTableIdentifier()
    --- End diff --
   
    Use  getOrCreateTableLock


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2531: [HOTFIX] Improved BlockDataMap caching perfor...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2531#discussion_r204000404
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java ---
    @@ -237,6 +241,32 @@ public void invalidate(String segmentId, int segmentPropertiesIndex,
               .isEmpty()) {
             indexToSegmentPropertiesWrapperMapping.remove(segmentPropertiesIndex);
             segmentPropWrapperToSegmentSetMap.remove(segmentPropertiesWrapper);
    +      } else if (!clearSegmentWrapperFromMap
    +          && segmentIdAndSegmentPropertiesIndexWrapper.segmentIdSet.isEmpty()) {
    +        // min max columns can very when cache is modified. So even though entry is not required
    +        // to be deleted from map clear the column cache so that it can filled again
    +        segmentPropertiesWrapper.clear();
    +        LOGGER.info("cleared min max for segmentProperties at index: " + segmentPropertiesIndex);
    +      }
    +    }
    +  }
    +
    +  /**
    +   * add segmentId at given segmentPropertyIndex
    +   * Note: This method is getting used in extension with other features. Please do not remove
    +   *
    +   * @param segmentPropertiesIndex
    +   * @param segmentId
    +   */
    +  public void addSegmentId(int segmentPropertiesIndex, String segmentId) {
    +    SegmentPropertiesWrapper segmentPropertiesWrapper =
    +        indexToSegmentPropertiesWrapperMapping.get(segmentPropertiesIndex);
    +    if (null != segmentPropertiesWrapper) {
    +      SegmentIdAndSegmentPropertiesIndexWrapper segmentIdAndSegmentPropertiesIndexWrapper =
    +          segmentPropWrapperToSegmentSetMap.get(segmentPropertiesWrapper);
    +      synchronized (segmentPropertiesWrapper.getTableIdentifier().getCarbonTableIdentifier()
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [HOTFIX] Improved BlockDataMap caching performance d...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7357/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [HOTFIX] Improved BlockDataMap caching performance d...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6120/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [HOTFIX] Improved BlockDataMap caching performance d...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [HOTFIX] Improved BlockDataMap caching performance d...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user brijoobopanna commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    retest sdv please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [HOTFIX] Improved BlockDataMap caching performance d...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5943/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [HOTFIX] Improved BlockDataMap caching performance d...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user brijoobopanna commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    retest sdv please



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2531: [HOTFIX] Improved BlockDataMap caching performance d...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2531
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5946/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2531: [HOTFIX] Improved BlockDataMap caching perfor...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2531


---