GitHub user manishnalla1994 opened a pull request:
https://github.com/apache/carbondata/pull/3056 [CARBONDATA-3236] Fix for JVM Crash for insert into new table from old table Problem: Insert into new table from old table fails with JVM crash. This happened because both the query and load flow were assigned the same taskId and once query finished it freed the unsafe memory while the insert still in progress. Solution: As the flow for file format is direct flow and uses on-heap(safe) so no need to free the unsafe memory. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata JVMCrashForLoadAndQuery Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3056.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3056 ---- commit 150c710218ff3c09ccfccc9b2df970006964ef6d Author: manishnalla1994 <manish.nalla1994@...> Date: 2019-01-08T10:42:55Z Fix for JVM Crash for file format ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3056 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2437/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3056 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2217/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3056 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10475/ --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/3056 Reviewed, LGTM --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/3056 > because both the query and load flow were assigned the same taskId and once query finished it freed the unsafe memory while the insert still in progress. How do you handle the scenario for stored-by-carbondata carbontable? In that scenario, both of the query flow and load flow use offheap and encounter the same problem just as you described above. But I remembered we handle that differently from the current PR, which I think their modification can be similar. Please check this again. --- |
In reply to this post by qiuchenjian-2
Github user manishnalla1994 commented on the issue:
https://github.com/apache/carbondata/pull/3056 @xuchuanyin Datasource table uses direct filling flow. As in direct flow there is no intermediate buffer so we are not using off-heap to store the page data in memory(filling all the records of a page to vector instead of filling batch wise). So in this case we can remove freeing of unsafe memory for Query as its not required. In case of stored by table, handling will be different as we support both batch wise filling and direct filling and for batch filling we are using unsafe, so we have to clear unsafe memory in this case. Here same handling is not required for data source table. Please refer https://github.com/apache/carbondata/pull/2591 for stored by handling of this issue. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/3056 LGTM --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/3056 LGTM --- |
In reply to this post by qiuchenjian-2
|
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/3056 @manishnalla1994 > Solution: Check if any other RDD is sharing the same task context. If so, don't the clear the resource at that time, the other RDD which shared the context should clear the memory once after the task is finished. It seems in #2591, for data source table scenario, if the query and insert procedures also share the same context, it can also benefit from the implementation in #2591 without any changes. Right? --- |
Free forum by Nabble | Edit this page |