[GitHub] [carbondata] jack86596 opened a new pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

classic Classic list List threaded Threaded
36 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 opened a new pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox

jack86596 opened a new pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057


   …which leads to memory leak
   
    ### Why is this PR needed?
   When there are two spark applications, one drop a table, some cache information of this table stay in another application and cannot be removed with any method like "Drop metacache" command. This leads to memory leak. With the passage of time, memory leak will also accumulate which finally leads to driver OOM. Following are the leak points: 1) tableModifiedTimeStore in CarbonFileMetastore; 2) segmentLockMap in BlockletDataMapIndexStore; 3) absoluteTableIdentifierByteMap in SegmentPropertiesAndSchemaHolder; 4) tableInfoMap in CarbonMetadata.
   
    ### What changes were proposed in this PR?
   Using expiring map to cache the table information in CarbonMetadata and modified time in CarbonFileMetaStore so that stale information will be cleared automatically after the expiration time. Operations in BlockletDataMapIndexStore no need to be locked, remove all the logic related to segmentLockMap.
       
    ### Does this PR introduce any user interface change?
    - New configuration carbon.metacache.expiration.seconds is added.
   
    ### Is any new testcase added?
    - No


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox

CarbonDataQA2 commented on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-747421780


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5196/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-747424633


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3436/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-748006845


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3290/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-748007077


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5048/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

jack86596 commented on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-748481062


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-748492092


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5216/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-748493017


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3456/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

kunal642 commented on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-749528326


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#discussion_r547286675



##########
File path: docs/configuration-parameters.md
##########
@@ -149,6 +149,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.max.pagination.lru.cache.size.in.mb | -1 | Maximum memory **(in MB)** upto which the SDK pagination reader can cache the blocklet rows. Suggest to configure as multiple of blocklet size. Default value of -1 means there is no memory limit for caching. Only integer values greater than 0 are accepted. |
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
+| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata, after the time configured since last access to the cache entry, tableInfo will be removed from cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |

Review comment:
       it is not only for CarbonMetadata, this property is also used for tableModifiedTimeStore. Please update




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#discussion_r547289224



##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2086,6 +2086,27 @@ public boolean isSetLenientEnabled() {
     return Boolean.parseBoolean(configuredValue);
   }
 
+  public long getMetaCacheExpirationTime() {
+    String configuredValue = CarbonProperties.getInstance()
+        .getProperty(CarbonCommonConstants.CARBON_METACACHE_EXPIRATION_TIME_IN_SECONDS);
+    if (configuredValue == null || configuredValue.equalsIgnoreCase("0")) {
+      return CarbonCommonConstants.CARBON_METACACHE_EXPIRATION_TIME_IN_SECONDS_DEFAULT;
+    }
+    try {
+      long expirationTime = Long.parseLong(configuredValue);
+      LOGGER.info("Value for "
+          + CarbonCommonConstants.CARBON_METACACHE_EXPIRATION_TIME_IN_SECONDS + " is "
+          + expirationTime + ".");
+      return expirationTime;
+    } catch (NumberFormatException e) {
+      LOGGER.info(configuredValue + " is not a valid input for "

Review comment:
       could be a warning log




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

jack86596 commented on a change in pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#discussion_r547319412



##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2086,6 +2086,27 @@ public boolean isSetLenientEnabled() {
     return Boolean.parseBoolean(configuredValue);
   }
 
+  public long getMetaCacheExpirationTime() {
+    String configuredValue = CarbonProperties.getInstance()
+        .getProperty(CarbonCommonConstants.CARBON_METACACHE_EXPIRATION_TIME_IN_SECONDS);
+    if (configuredValue == null || configuredValue.equalsIgnoreCase("0")) {
+      return CarbonCommonConstants.CARBON_METACACHE_EXPIRATION_TIME_IN_SECONDS_DEFAULT;
+    }
+    try {
+      long expirationTime = Long.parseLong(configuredValue);
+      LOGGER.info("Value for "
+          + CarbonCommonConstants.CARBON_METACACHE_EXPIRATION_TIME_IN_SECONDS + " is "
+          + expirationTime + ".");
+      return expirationTime;
+    } catch (NumberFormatException e) {
+      LOGGER.info(configuredValue + " is not a valid input for "

Review comment:
       changed to warning log.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

jack86596 commented on a change in pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#discussion_r547319531



##########
File path: docs/configuration-parameters.md
##########
@@ -149,6 +149,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.max.pagination.lru.cache.size.in.mb | -1 | Maximum memory **(in MB)** upto which the SDK pagination reader can cache the blocklet rows. Suggest to configure as multiple of blocklet size. Default value of -1 means there is no memory limit for caching. Only integer values greater than 0 are accepted. |
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
+| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata, after the time configured since last access to the cache entry, tableInfo will be removed from cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |

Review comment:
       added.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-749627683


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5233/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-749633365


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3472/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#discussion_r547665649



##########
File path: docs/configuration-parameters.md
##########
@@ -149,6 +149,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.max.pagination.lru.cache.size.in.mb | -1 | Maximum memory **(in MB)** upto which the SDK pagination reader can cache the blocklet rows. Suggest to configure as multiple of blocklet size. Default value of -1 means there is no memory limit for caching. Only integer values greater than 0 are accepted. |
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
+| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |

Review comment:
       During your test did you face NullPointerException as mentioned in this doc?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 removed a comment on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

kunal642 removed a comment on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-749528326


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

jack86596 commented on a change in pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#discussion_r547692764



##########
File path: docs/configuration-parameters.md
##########
@@ -149,6 +149,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.max.pagination.lru.cache.size.in.mb | -1 | Maximum memory **(in MB)** upto which the SDK pagination reader can cache the blocklet rows. Suggest to configure as multiple of blocklet size. Default value of -1 means there is no memory limit for caching. Only integer values greater than 0 are accepted. |
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
+| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |

Review comment:
       Yes, if we set carbon.metacache.expiration.seconds to very small like 1s, UT will fail with NullPointerException.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#discussion_r547732205



##########
File path: docs/configuration-parameters.md
##########
@@ -149,6 +149,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.max.pagination.lru.cache.size.in.mb | -1 | Maximum memory **(in MB)** upto which the SDK pagination reader can cache the blocklet rows. Suggest to configure as multiple of blocklet size. Default value of -1 means there is no memory limit for caching. Only integer values greater than 0 are accepted. |
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
+| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |

Review comment:
       then its a problem, because queries would fail randomly.
   
   @akashrn5 @ajantha-bhat what do you suggest?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#discussion_r547751657



##########
File path: docs/configuration-parameters.md
##########
@@ -149,6 +149,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.max.pagination.lru.cache.size.in.mb | -1 | Maximum memory **(in MB)** upto which the SDK pagination reader can cache the blocklet rows. Suggest to configure as multiple of blocklet size. Default value of -1 means there is no memory limit for caching. Only integer values greater than 0 are accepted. |
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
+| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |

Review comment:
       In one application itself if we do drop table and query concurrently what is the behavior ?
   a) Giving NPE is not good idea. If the other application has OOM, user can call drop metacache ?
   b) Also for multiple application scenario, user should use index server for caching, which will avoid these problems as only one cache will be present for all the application.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


12