a few more notes
This commit is contained in:
parent
1878a28b6a
commit
302751cfc1
3
TODO
3
TODO
@ -227,3 +227,6 @@ The key to performance is elegance, not battalions of special cases.
|
|||||||
|
|
||||||
You’re bound to be unhappy if you optimize everything.
|
You’re bound to be unhappy if you optimize everything.
|
||||||
— Donald Knuth
|
— Donald Knuth
|
||||||
|
|
||||||
|
You'll never know how bad things are until you look.
|
||||||
|
- Howard Chu
|
||||||
|
@ -206,7 +206,8 @@ also give you an idea of where to start. If you need only a 10-20%
|
|||||||
performance improvement, you can probably get that with some implementation
|
performance improvement, you can probably get that with some implementation
|
||||||
tweaks and smaller fixes. If you need a factor of 10x or more, then just
|
tweaks and smaller fixes. If you need a factor of 10x or more, then just
|
||||||
replacing a multiplication with a left-shift isn't going to cut it. That's
|
replacing a multiplication with a left-shift isn't going to cut it. That's
|
||||||
probably going to call for changes up and down your stack.
|
probably going to call for changes up and down your stack, possibly redesigning
|
||||||
|
large portions of the system with these performance goals in mind.
|
||||||
|
|
||||||
Good performance work requires knowledge at many different levels, from
|
Good performance work requires knowledge at many different levels, from
|
||||||
system design, networking, hardware (CPU, caches, storage), algorithms,
|
system design, networking, hardware (CPU, caches, storage), algorithms,
|
||||||
@ -467,6 +468,12 @@ sorting will pay off. On the other hand, if you're mostly doing lookups,
|
|||||||
maybe having an array was the wrong choice and you'd be better off paying the
|
maybe having an array was the wrong choice and you'd be better off paying the
|
||||||
O(1) lookup cost for a map instead.
|
O(1) lookup cost for a map instead.
|
||||||
|
|
||||||
|
Being able to analyze your problem in terms of big-O notation also means you can
|
||||||
|
figure out if you're already at the limit for what is possible for your problem,
|
||||||
|
and if you need to change approaches in order to speed things up. For example,
|
||||||
|
finding the minimum of an unsorted list is `O(n)`, because you have to look at
|
||||||
|
every single item. There's no way to make that faster.
|
||||||
|
|
||||||
If your data structure is static, then you can generally do much better than
|
If your data structure is static, then you can generally do much better than
|
||||||
the dynamic case. It becomes easier to build an optimal data structure
|
the dynamic case. It becomes easier to build an optimal data structure
|
||||||
customized for exactly your lookup patterns. Solutions like minimal perfect
|
customized for exactly your lookup patterns. Solutions like minimal perfect
|
||||||
@ -926,6 +933,7 @@ allocate it. But you also pay every time the garbage collection runs.
|
|||||||
* API design to limit allocations:
|
* API design to limit allocations:
|
||||||
* allow passing in buffers so caller can reuse rather than forcing an allocation
|
* allow passing in buffers so caller can reuse rather than forcing an allocation
|
||||||
* you can even modify a slice in place carefully while you scan over it
|
* you can even modify a slice in place carefully while you scan over it
|
||||||
|
* passing in a struct could allow caller to stack allocate it
|
||||||
* reducing pointers to reduce gc scan times
|
* reducing pointers to reduce gc scan times
|
||||||
* pointer-free slices
|
* pointer-free slices
|
||||||
* maps with both pointer-free keys and values
|
* maps with both pointer-free keys and values
|
||||||
@ -965,6 +973,7 @@ allocate it. But you also pay every time the garbage collection runs.
|
|||||||
* but "off-heap", so ignored by gc (but so would a pointerless slice)
|
* but "off-heap", so ignored by gc (but so would a pointerless slice)
|
||||||
* need to think about serialization format: how to deal with pointers, indexing (mph, index header)
|
* need to think about serialization format: how to deal with pointers, indexing (mph, index header)
|
||||||
* speedy de-serialization
|
* speedy de-serialization
|
||||||
|
* binary wire protocol to struct when you already have the buffer
|
||||||
* string <-> slice conversion, []byte <-> []uint32, ...
|
* string <-> slice conversion, []byte <-> []uint32, ...
|
||||||
* int to bool unsafe hack (but cmov) (but != 0 is also branch-free)
|
* int to bool unsafe hack (but cmov) (but != 0 is also branch-free)
|
||||||
* padding:
|
* padding:
|
||||||
@ -1023,7 +1032,7 @@ Techniques specific to the architecture running the code
|
|||||||
* introduction to CPU caches
|
* introduction to CPU caches
|
||||||
* performance cliffs
|
* performance cliffs
|
||||||
* building intuition around cache-lines: sizes, padding, alignment
|
* building intuition around cache-lines: sizes, padding, alignment
|
||||||
* OS tools to view cache-misses
|
* OS tools to view cache-misses (perf)
|
||||||
* maps vs. slices
|
* maps vs. slices
|
||||||
* SOA vs AOS layouts: row-major vs. column major; when you have an X, do you need another X or do you need a Y?
|
* SOA vs AOS layouts: row-major vs. column major; when you have an X, do you need another X or do you need a Y?
|
||||||
* temporal and spacial locality: use what you have and what's nearby as much as possible
|
* temporal and spacial locality: use what you have and what's nearby as much as possible
|
||||||
@ -1050,7 +1059,7 @@ Techniques specific to the architecture running the code
|
|||||||
|
|
||||||
* sorting data can help improve performance via both cache locality and branch prediction, even taking into account the time it takes to sort
|
* sorting data can help improve performance via both cache locality and branch prediction, even taking into account the time it takes to sort
|
||||||
* function call overhead: inliner is getting better
|
* function call overhead: inliner is getting better
|
||||||
* reduce data copies
|
* reduce data copies (including for repeated large lists of function params)
|
||||||
|
|
||||||
* Comment about Jeff Dean's 2002 numbers (plus updates)
|
* Comment about Jeff Dean's 2002 numbers (plus updates)
|
||||||
* cpus have gotten faster, but memory hasn't kept up
|
* cpus have gotten faster, but memory hasn't kept up
|
||||||
|
Loading…
Reference in New Issue
Block a user