fix notes on cache evictions

2019-04-13 07:34:27 -07:00 · 2019-04-13 07:34:27 -07:00 · 7d55ac03a8
commit 7d55ac03a8
parent 5176a93c93
1 changed files with 24 additions and 18 deletions
--- a/performance.md
+++ b/performance.md
@ -659,7 +659,7 @@ compiler optimization became slower once the compiler was improved.
 My RC6 cipher implementation had a 10% speed up for the inner loop just by
 switching to `encoding/binary` and `math/bits` instead of my hand-rolled
-version.
+versions.
 Similarly, the `compress/bzip2` package was sped by switching to [simpler
 code the compiler was better able to
@ -724,26 +724,32 @@ service instances if the external cache is shared.
 A cache saves information you've just spent time computing in the hopes that
 you'll be able to reuse it again soon and save the computation time. A cache
 doesn't need to be complex.  Even storing a single item -- the most recently
-seen query/response -- can be a big win.
+seen query/response -- can be a big win, as seen in the `time.Parse()` example
 below.
-* Your cache doesn't even need to be huge.
+With caches it's important to compare the cost (in terms of actual wall-clock
-  * see `time.Parse()` example below; just a single value made an impact
+and code complexity) of your caching logic to simply refetching or recomputing
-* But beware cache invalidation, concurrent access / updates, etc.
+the data.  The more complex algorithms that give higher hit rates are generally
-* Random cache eviction is fast and sufficiently effective.
+not cheap themselves.  Randomized cache eviction is simple and fast and can be
-* Random cache insertion can limit cache to popular items with minimal logic.
+effective in many cases.  Similarly, randomized cache *insertion* can limit your
-* Compare cost (time, complexity) of cache logic to cost of refetching the data.
+cache to only popular items with minimal logic.  While these may not be as effective
-* A large cache can increase GC pressure and keep blowing processor cache.
+as the more complex algorithms, the big improvement will be adding a cache in the first
-* At the extreme (little or no eviction, caching all requests to an expensive function) this can turn into [memoization](https://en.wikipedia.org/wiki/Memoization)
+place: choosing exactly which caching algorithm gives only minor improvements.
-If in the real world repeated requests are sufficiently rare, it can be more
+It's important to benchmark your choice of cache eviction algorithm with
-expensive to keep cached responses around than to simply recompute them when
+real-world traces. If in the real world repeated requests are sufficiently rare,
-needed.
+it can be more expensive to keep cached responses around than to simply
 recompute them when needed. I've had services where testing with production data
 showed even an optimal cache wasn't worth it. we simply did't have sufficient
 repeated requests to make the added complexity of a cache make sense.
-I've done experiments with a network trace for a service that showed even an optimal
+Your expected cache hit ratio is important. You'll want to export the ratio to
-cache wasn't worth it. Your expected hit ratio is important. You'll want to
+your monitoring stack. Changing ratios will show a shift in traffic. Then it's
-export the ratio to your monitoring stack. Changing ratios will show a
+time to revisit the cache size or the expiration policy.
-shift in traffic. Then it's time to revisit the cache size or the
+
-expiration policy.
+A large cache can increase GC pressure. At the extreme (little or no eviction,
 caching all requests to an expensive function) this can turn into
 [memoization](https://en.wikipedia.org/wiki/Memoization)
 Program tuning: