expand on time parsing example

2018-03-04 15:42:22 -08:00 · 2018-03-04 15:42:22 -08:00 · 65ce05dffd
commit 65ce05dffd
parent de5a84a467
1 changed files with 36 additions and 6 deletions
--- a/performance.md
+++ b/performance.md
@ -647,13 +647,43 @@ function. The improved solution we came up was to individually hash the
 keys/values as they were added to the map, then xor all these hashes together
 to create the identifier.
-TODO: flesh out log parsing example:
+Here's an example of specialization.
-* time parsing is slow
+Let's say we're processing a massive log file for a single day, and each line
-* adding a single item cache is good
+begins with a time stamp.
-* removing time parsing and doing some integer math by hand is again faster
+
-* general algorithm is slow, you can be faster because you know more about your problem
+```
-* but the code is more closely tied to exactly what you need; harder to change
+Sun  4 Mar 2018 14:35:09 PST <...........................>
 ```
 For each line, we're going to call `time.Parse()` to turn it into a epoch. If
 profiling shows us `time.Parse()` is the bottleneck, we have a few options to
 speed things up.
 The easiest is to keep a single-item cache of the previously seen time stamp
 and the associated epoch.  As long as our log file has multiple lines for a single
 second, this will be a win.  For the case of a 10 million line log file,
 this strategy reduces the nunmber of expensive calls to `time.Parse()` from
 10,000,000 to 86400 -- one for each unique second.
 TODO: code example for single-item cache
 Can we do more? Because we know exactly what format the timestamps are in
 *and* that they all fall in a single day, we can write custom time parsing
 *logic that takes this into account. We can calculate the epoch for midnight,
 then extract hour, minute, and second from the timestamp string -- they'll
 all be in fixed offsets in the string -- and do some integer math.
 TODO: code example for string offset version
 In my benchmarks, this reduced the time parsing from 275ns/op to 5ns/op.
 (Of course, even at 275 ns/op, you're more likely to be blocked on I/O and
 not CPU for time parsing.)
 The general alorithm is slow because it has to handle more cases. Your
 algorithm can be faster because you know more about your problem. But the
 code is more closely tied to exactly what you need. It's much more difficult
 to update if the time format changes.
 Optimization is specialization, and specialized code is more fragile to
 change than general purpose code.