[GitHub] carbondata pull request #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor ...

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor ...

qiuchenjian-2
GitHub user manishgupta88 opened a pull request:

    https://github.com/apache/carbondata/pull/2267

    [CARBONDATA-2433] [Lucene GC Issue] Executor OOM because of GC when blocklet pruning is done using Lucene datamap

    **Problem**
    Executor OOM because of GC when blocklet pruning is done using Lucene datamap
   
    **Analysis**
    While seraching using lucene it creates a PriorityQueue to hold the documents. As size is not specified by default the PriorityQueue size is
    equal to the number of lucene documents. As the docuemnts start getting added to the heap the GC time increases and after some time task fails due
    to excessive GC and executor OOM occurs.
    Reference blog: http://lucene.472066.n3.nabble.com/Optimization-of-memory-usage-in-PriorityQueue-td590355.html
   
    **Fix**
    Specify the limit for first search and after that use the searchAfter API to search in incremental order with gieven PriorityQueue size.
   
     - [ ] Any interfaces changed?
     No
     - [ ] Any backward compatibility impacted?
     No
     - [ ] Document update required?
    No
     - [ ] Testing done
    Manually verified with 3.7 billion data. For a query, GC time came down to 5 sec from 40 min.
     
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishgupta88/carbondata lucene_gc_issue

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2267
   
----
commit ecea6009c55326817826bc4de8b14fad52b6db35
Author: manishgupta88 <tomanishgupta18@...>
Date:   2018-05-03T15:10:41Z

    Problem
    Executor OOM because of GC when blocklet pruning is done using Lucene datamap
   
    Analysis
    While seraching using lucene it creates a PriorityQueue to hold the documents. As size is not specified by default the PriorityQueue size is
    equal to the number of lucene documents. As the docuemnts start getting added to the heap the GC time increases and after some time task fails due
    to excessive GC and executor OOM occurs.
    Reference blog: http://lucene.472066.n3.nabble.com/Optimization-of-memory-usage-in-PriorityQueue-td590355.html
   
    Fix
    Specify the limit for first search and after that use the searchAfter API to search in incremental order with gieven PriorityQueue size.

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor ...

qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2267#discussion_r185883427
 
    --- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMap.java ---
    @@ -60,6 +60,15 @@
     @InterfaceAudience.Internal
     public class LuceneFineGrainDataMap extends FineGrainDataMap {
     
    +  /**
    +   * search limit will help in deciding the size of priority queue which is used by lucene to store
    +   * the documents in heap. By default it is 10 means in one search max of 10 documents can be
    +   * stored in heap by lucene. This way ot will help in reducing the GC.
    --- End diff --
   
    typo in `This way ot will`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4461/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5621/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4714/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    Hi, I tried this PR in a table with 60M row table, and I added log in the loop you add, it prints
    18/05/04 10:42:30 INFO LuceneFineGrainDataMap: Executor task launch worker for task 0 Searching for l_comment:packages with top 59986052 result
    18/05/04 10:42:37 INFO LuceneFineGrainDataMap: Executor task launch worker for task 0 Finished searching, preparing result
    18/05/04 10:42:37 INFO LuceneFineGrainDataMap: Executor task launch worker for task 0 Loop 1
    18/05/04 10:42:37 INFO LuceneFineGrainDataMap: Executor task launch worker for task 0 Loop 2
    18/05/04 10:42:37 INFO LuceneFineGrainDataMap: Executor task launch worker for task 0 Loop 3
    18/05/04 10:42:37 INFO LuceneFineGrainDataMap: Executor task launch worker for task 0 Loop 4
    18/05/04 10:42:37 INFO LuceneFineGrainDataMap: Executor task launch worker for task 0 Loop 5
    18/05/04 10:42:37 INFO LuceneFineGrainDataMap: Executor task launch worker for task 0 Loop 6
    18/05/04 10:42:37 INFO LuceneFineGrainDataMap: Executor task launch worker for task 0 Loop 7
    18/05/04 10:42:37 INFO LuceneFineGrainDataMap: Executor task launch worker for task 0 Loop 8
    18/05/04 10:42:37 INFO LuceneFineGrainDataMap: Executor task launch worker for task 0 Loop 9
    18/05/04 10:42:37 INFO LuceneFineGrainDataMap: Executor task launch worker for task 0 Loop 10
    18/05/04 10:42:37 INFO LuceneFineGrainDataMap: Executor task launch worker for task 0 Loop end
    18/05/04 10:42:37 INFO LuceneFineGrainDataMap: Executor task launch worker for task 0 About to searchAfter, remainingHits: 4418435, numberOfDocumentsToBeQueried: 10
   
    And the program calls indexSearcher.searchAfter and not returning



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5633/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4473/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5637/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4479/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4494/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5654/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4731/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6004/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    I have verified with 100 milliion data, and it works fine.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4845/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2267


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5019/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2267: [CARBONDATA-2433] [Lucene GC Issue] Executor OOM bec...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2267
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6016/



---
12