2016-05-22 14:21:23 +08:00
|
|
|
This document outlines best practices for writing high-performance Go code.
|
|
|
|
|
|
|
|
At the moment, it's a collection of links to videos, slides, and blog posts
|
|
|
|
("awesome-go-performance"), but I would like this to evolve into a longer book
|
|
|
|
format where the content is here instead of external. The links should be sorted into categories.
|
|
|
|
|
2016-05-22 19:14:31 +08:00
|
|
|
All the content will be licensed under CC-BY-SA.
|
|
|
|
|
2016-05-23 20:21:18 +08:00
|
|
|
## Optimization Workflow
|
|
|
|
|
2016-05-22 14:21:23 +08:00
|
|
|
* All optimizations should follow these steps:
|
2016-05-22 18:50:16 +08:00
|
|
|
|
|
|
|
1. determine your performance goals and confirm you are not meeting them
|
|
|
|
1. profile to identify the areas to improve. This can be CPU, heap allocations, or goroutine blocking.
|
|
|
|
1. benchmark to determine the speed up your solution will provide using
|
2016-05-26 16:39:34 +08:00
|
|
|
the built-in benchmarking framework (<http://golang.org/pkg/testing/>)
|
2017-04-24 15:06:38 +08:00
|
|
|
Make sure you're benchmarking the right thing.
|
2016-05-22 18:50:16 +08:00
|
|
|
1. profile again afterwards to verify the issue is gone
|
2017-12-28 17:54:38 +08:00
|
|
|
1. use <https://godoc.org/golang.org/x/perf/benchstat> or
|
2016-05-26 16:39:34 +08:00
|
|
|
<https://github.com/codahale/tinystat> to verify that a set of timings
|
2016-05-22 18:50:16 +08:00
|
|
|
are 'sufficiently' different for an optimization to be worth the
|
|
|
|
added code complexity.
|
|
|
|
1. use <https://github.com/tsenart/vegeta> for load testing http services
|
|
|
|
1. make sure your latency numbers make sense: <https://youtu.be/lJ8ydIuPFeU>
|
2016-05-22 14:21:23 +08:00
|
|
|
|
2016-05-23 20:21:18 +08:00
|
|
|
The first step is important. It tells you when and where to start optimizing.
|
|
|
|
More importantly, it also tells you when to stop. Pretty much all
|
|
|
|
optimizations add code complexity in exchange for speed. And you can *always*
|
|
|
|
make code faster. It's a balancing act.
|
2016-05-22 18:44:02 +08:00
|
|
|
|
2016-05-22 14:21:23 +08:00
|
|
|
The basic rules of the game are:
|
|
|
|
|
2016-05-22 18:50:16 +08:00
|
|
|
1. minimize CPU usage
|
|
|
|
* do less work
|
|
|
|
* this generally means "a faster algorithm"
|
|
|
|
* but CPU caches and the hidden constants in O() can play tricks on you
|
|
|
|
1. minimize allocations (which leads to less CPU stolen by the GC)
|
|
|
|
1. make your data quick to access
|
|
|
|
|
2017-04-24 15:06:20 +08:00
|
|
|
This book is split into different sections:
|
|
|
|
1) basic tips for writing not-slow software
|
|
|
|
* CS 101-level stuff
|
|
|
|
2) tips for writing fast software
|
|
|
|
* Go-specific sections on how to get the best from Go
|
|
|
|
3) advanced tips for writing *really* fast software
|
|
|
|
* For when your optimized code isn't fast enough
|
|
|
|
|
|
|
|
## Basics
|
|
|
|
|
|
|
|
1. choose the best algorithm
|
|
|
|
* traditional computer science analysis
|
|
|
|
* O(n^2) vs O(n log n) vs O(log n) vs O(1)
|
|
|
|
* this should handle the majority of your optimization cases
|
|
|
|
* be aware of http://accidentallyquadratic.tumblr.com/
|
|
|
|
* https://agtb.wordpress.com/2010/12/23/progress-in-algorithms-beats-moore%E2%80%99s-law/
|
|
|
|
1. pre-compute things you need
|
|
|
|
1. add a cache -> reduces work
|
|
|
|
|
2016-05-22 18:50:16 +08:00
|
|
|
## Introductory Profiling
|
|
|
|
|
|
|
|
Techniques applicable to source code in general
|
|
|
|
|
|
|
|
1. introduction to pprof
|
2016-05-26 16:39:34 +08:00
|
|
|
* go tool pprof (and <https://github.com/google/pprof>)
|
|
|
|
1. Writing and running (micro)benchmarks
|
|
|
|
* -cpuprofile / -memprofile / -benchmem
|
|
|
|
1. How to read it pprof output
|
2016-05-22 18:50:16 +08:00
|
|
|
1. What are the different pieces of the runtime that show up
|
2016-05-26 16:39:34 +08:00
|
|
|
1. Macro-benchmarks (Profiling in production)
|
|
|
|
* net/http/pprof
|
2016-05-22 18:50:16 +08:00
|
|
|
|
2017-04-24 15:06:20 +08:00
|
|
|
## Tracer
|
|
|
|
|
|
|
|
|
2016-05-22 18:50:16 +08:00
|
|
|
## Advanced Techniques
|
|
|
|
|
|
|
|
* Techniques specific to the architecture running the code
|
|
|
|
* introduction to CPU caches
|
2016-05-23 00:18:55 +08:00
|
|
|
* building intuition around cache-lines: sizes, padding, alignment
|
|
|
|
* false-sharing
|
|
|
|
* OS tools to view cache-misses
|
2016-05-22 18:50:16 +08:00
|
|
|
* (also branch prediction)
|
|
|
|
|
|
|
|
* Comment about Jeff Dean's 2002 numbers (plus updates)
|
|
|
|
* cpus have gotten faster, but memory hasn't kept up
|
|
|
|
|
2016-09-21 09:02:26 +08:00
|
|
|
## Heap Allocations
|
|
|
|
* Stack vs. heap allocations
|
|
|
|
* What causes heap allocations?
|
|
|
|
* Understanding escape analysis
|
|
|
|
* Using sync.Pool effectively
|
|
|
|
|
2016-05-22 18:50:16 +08:00
|
|
|
## Runtime
|
|
|
|
* cost of calls via interfaces (indirect calls on the CPU level)
|
|
|
|
* runtime.convT2E / runtime.convT2I
|
|
|
|
* type assertions vs. type switches
|
|
|
|
* defer
|
2016-05-22 22:12:50 +08:00
|
|
|
* special-case map implementations for ints, strings
|
2016-05-22 18:50:16 +08:00
|
|
|
|
|
|
|
## Common gotchas with the standard library
|
|
|
|
|
|
|
|
* time.After() leaks until it fires
|
|
|
|
* Reusing HTTP connections...
|
|
|
|
* ....
|
|
|
|
|
|
|
|
## Unsafe
|
|
|
|
* And all the dangers that go with it
|
|
|
|
* Common uses for unsafe
|
|
|
|
* mmap'ing data files
|
2016-05-23 20:21:18 +08:00
|
|
|
* speedy de-serialization
|
2016-05-22 18:50:16 +08:00
|
|
|
|
2016-09-21 09:03:33 +08:00
|
|
|
## cgo
|
|
|
|
* Performance characteristics of cgo calls
|
|
|
|
* Tricks to reduce the costs
|
|
|
|
* Passing pointers between Go and C
|
|
|
|
|
2016-05-22 18:50:16 +08:00
|
|
|
## Assembly
|
2016-05-22 22:13:10 +08:00
|
|
|
* Stuff about writing assembly code for Go
|
2016-05-25 15:25:28 +08:00
|
|
|
* brief into to syntax
|
|
|
|
* calling convention
|
|
|
|
* using opcodes unsupported by the asm
|
|
|
|
* notes about why intrinsics are hard
|
2016-05-22 18:50:16 +08:00
|
|
|
|
2016-05-22 22:13:10 +08:00
|
|
|
## Alternate implementations
|
2016-05-22 18:50:16 +08:00
|
|
|
* Popular replacements for standard library packages:
|
|
|
|
* encoding/json -> ffjson
|
|
|
|
* net/http -> fasthttp
|
|
|
|
* regexp -> ragel (or other regular expression package)
|
2016-05-23 20:21:18 +08:00
|
|
|
* serialization
|
|
|
|
* encoding/gob -> <https://github.com/alecthomas/go_serialization_benchmarks>
|
|
|
|
* protobuf -> <https://github.com/gogo/protobuf>
|
|
|
|
* all formats have trade-offs; choose one that matches what you need
|
2016-05-22 18:50:16 +08:00
|
|
|
|
|
|
|
## Tooling
|
|
|
|
|
|
|
|
Look at some more interesting/advanced tooling
|
|
|
|
|
|
|
|
* perf (perf2pprof)
|
|
|
|
* go-torch (+flamegraphs)
|