GitHub user manishgupta88 opened a pull request:
https://github.com/apache/carbondata/pull/2898 [WIP] Fixed query failure in fileformat due stale cache issue **Problem** While using FileFormat API, if a table created, dropped and then recreated with the same name the query fails because of schema mismatch issue **Analysis** In case of carbondata used through FileFormat API, once a table is dropped and recreated with the same name again then because the dataMap contains the stale carbon table schema mismatch exception is thrown **Solution** To avoid such scenarios it is always better to update the carbon table object retrieved - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done Added UT to verify the scenario - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/carbondata stale_carbon_table Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2898.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2898 ---- commit 2b6789ee5464f90f43ecac3654e58424257eaa29 Author: m00258959 <manish.gupta@...> Date: 2018-11-05T10:15:46Z Fixed select query failure due to stale carbonTable in dataMapFactory class ---- --- |
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2898 I think the current modification does not fix the root of the problem. If you think the table information is not get cleared, you should get it cleared, not just update it when you need it. The current implementation means at some time, the table information is kept somewhere as outdated. --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2898 @xuchuanyin ...your point is correct. To explain this in detail 1. We have already a way to clear the cached DataMaps through API call `DataMapStoreManager.getInstance().clearDataMaps(AbsoluteTableIdentifier identifier)`. This API call ensures that for a given table all the dataMaps are cleared. 2. For FileFormat case if the above API is not integrated by the customer there is a possibility that drop table call will not come to carbondata layer and there can be few stale objects which can cause query failure. The PR is raised to handle the 2nd case. The other stale DataMaps are being already taken care by the LRU cache which will clear the stale entries one LRU cache threshold is reached. Let me know if you still have doubts --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2898 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1491/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2898 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9540/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2898 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1278/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2898 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2898 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1504/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2898 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9550/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2898 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1289/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2898 @manishgupta88 What if the user use fileformat carbontable and normal carbontable at the same time? For example, creating/using/droping fileformat table and then creating/using/droping normal carbon table, these tables are with the same name. Will this be OK? --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2898 @xuchuanyin ...yes this scenario will work fine. In case of dropping normal table it will go through CarbonSession flow and drop table command is already taking care of clearing the datamaps. In case of fileFormat table drop, if the clear dataMap API is not integrated by customer in that case the changes done in this PR will take care of referring only to latest carbon table --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2898 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2898 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1511/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2898 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1300/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2898 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9561/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2898 retest this please --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2898 @manishgupta88 it solves part of the problem (schema mismatch issue). But when you call getDataMaps it will give stale datamaps to you right. How those can be updated? --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2898 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1306/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2898 @ravipesala ...which method exactly you are referring to?...In all `getDataMap` methods latest `carbonTable` object is passed.and used for fetching the dataMaps..there is only one `getAllDataMaps` method which does not have any parameter but that is being only in the test cases... --- |
Free forum by Nabble | Edit this page |