You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* adding logic to cleanup on evict in cache
* implement cleanup logic for KV cache eviction to free GPU memory
* adding uuid for cache key
* reverting changes to precommit
* adding scores to cache and removing it from the constructor
* adding more robust type checking to run hf tests
* suppressing code cov output to stdout to make tests more readable
* suppressing code cov output to stdout to make tests more readable
* small fix
* removing cov-report from subprocesses
* setting lru cache to 0 for now; till we figure out block_attention and dynamic LRU size
* removing return_scores from docs
---------
Co-authored-by: Nathan Fulton <nathan@ibm.com>
"""Initializes the LRU cache with a certain capacity.
34
35
35
36
The `SimpleLRUCache` either contains a value or it doesn't. There is no cache hierarchy. Take care when choosing `capacity`. In practice usually a small value will be fine, but ideally you should try to choose a capacity based upon your available device memory and the context size of your model.
37
+
38
+
Args:
39
+
capacity: Maximum number of items to store in the cache.
40
+
on_evict: Optional callback function called when an item is evicted from the cache.
41
+
This can be used to free resources (e.g., GPU memory) when items are removed.
36
42
"""
37
43
self.capacity=capacity
38
44
self.cache: OrderedDict=OrderedDict()
45
+
self.on_evict=on_evict
39
46
40
47
defcurrent_size(self):
41
48
"""Just return the size of the key set. This isn't necessarily safe."""
0 commit comments