2016-05-22 14:21:23 +08:00
|
|
|
|
|
|
|
|
|
* blog posts
|
|
|
|
|
- http://jmoiron.net/blog/go-performance-tales/
|
2016-05-29 06:02:34 +08:00
|
|
|
|
- use integer map keys if possible
|
|
|
|
|
- hard to compete with Go's map implementation; esp. if your data structure has lots of pointer chasing
|
|
|
|
|
- aes-ni instructions make string hashing much faster
|
|
|
|
|
- prefer structs to maps if you know the map keys (esp. coming from perl, etc)
|
|
|
|
|
- channels are useful, but slow; raw atomics can help with performance
|
|
|
|
|
- cgo has overhead
|
|
|
|
|
- profile before optimizing
|
2016-05-22 14:21:23 +08:00
|
|
|
|
- http://slideshare.net/cloudflare/go-profiling-john-graham-cumming ( https://www.youtu.be/_41bkNr7eik )
|
2016-05-29 06:02:34 +08:00
|
|
|
|
- don't waste programmer cycles saving the wrong CPU cycles (or memory allocations)
|
|
|
|
|
- bash$ time; time.Now()/time.Since(); pprof.StartCPUProfile/pprof.StopCPUProfile; go tool pprof http://.../profile
|
|
|
|
|
- bash$ ps; runtime.ReadMemStats(); runtime.WriteHeapProfile(); go tool pprof http://.../heap
|
|
|
|
|
- slice operations are sometimes O(n)
|
|
|
|
|
- https://golang.org/pkg/runtime/debug/
|
|
|
|
|
- sync.Pool (basically)
|
2016-05-22 14:21:23 +08:00
|
|
|
|
- https://methane.github.io/2015/02/reduce-allocation-in-go-code
|
2016-05-29 06:02:34 +08:00
|
|
|
|
- 1. correctness is important
|
|
|
|
|
- 2. BenchmarkXXX with b.ReportAllocs() (or -benchmem when running)
|
|
|
|
|
- 3. allocfreetrace=1 produces stack trace on every allocation
|
|
|
|
|
- strategies:
|
|
|
|
|
- avoid string concat; use []byte+append() (+strconv.AppendInt(), ...)
|
|
|
|
|
- benchcmp
|
|
|
|
|
- avoid time.Format
|
|
|
|
|
- avoid range when iterating strings ([]rune conversion + utf8 decoding)
|
|
|
|
|
- can append string to []byte
|
|
|
|
|
- write two versions, one for string, one for []byte (avoids conversion+copy (sometimes...))
|
|
|
|
|
- reuse existing buffers instead of creating new ones
|
2016-05-22 14:21:23 +08:00
|
|
|
|
- http://bravenewgeek.com/so-you-wanna-go-fast/
|
2016-05-29 06:02:34 +08:00
|
|
|
|
- performance fast vs. delivery fast; make the right decision
|
|
|
|
|
- lock-free ring buffer vs. channels: faster except with GOMAXPROCS=1
|
|
|
|
|
- defer has a cost (allocation+cpu)
|
|
|
|
|
BenchmarkMutexDeferUnlock-8 20000000 96.6 ns/op
|
|
|
|
|
BenchmarkMutexUnlock-8 100000000 19.5 ns/op
|
|
|
|
|
- reflection+json
|
|
|
|
|
- ffjson avoids reflection
|
|
|
|
|
- msgp avoids json
|
|
|
|
|
- interfaces have dynamic dispatch which can't be inlined
|
|
|
|
|
- => use concrete types (+ code duplication)
|
|
|
|
|
- heap vs. stack; escape analysis
|
|
|
|
|
- lots of short-lived objects is expensive for the gc
|
|
|
|
|
- sync.Pool reuses objects *between* gc runs
|
|
|
|
|
- you need your own free list to hold onto things between gc runs
|
|
|
|
|
(but now you're subverting the purpose of a garbage collector)
|
|
|
|
|
- false sharing
|
|
|
|
|
- custom lock-free data structures: fast but *hard*
|
|
|
|
|
- "Speed comes at the cost of simplicity, at the cost of development time, and at the cost of continued maintenance. Choose wisely."
|
|
|
|
|
- https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs
|
|
|
|
|
- http://blog.golang.org/profiling-go-programs
|
2016-05-22 14:21:23 +08:00
|
|
|
|
- https://medium.com/%40hackintoshrao/daily-code-optimization-using-benchmarks-and-profiling-in-golang-gophercon-india-2016-talk-874c8b4dc3c5
|
|
|
|
|
- If you're writing benchmarks, read http://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go
|
2016-05-23 00:18:55 +08:00
|
|
|
|
- cache line explanation: http://mechanitis.blogspot.com/2011/07/dissecting-disruptor-why-its-so-fast_22.html
|
|
|
|
|
- avoiding false sharing: http://www.drdobbs.com/parallel/eliminate-false-sharing/217500206
|
|
|
|
|
- how does this translate to go? http://www.catb.org/esr/structure-packing/
|
2016-05-23 20:21:18 +08:00
|
|
|
|
- https://en.wikipedia.org/wiki/Amdahl%27s_law
|
|
|
|
|
- https://github.com/ardanlabs/gotraining/tree/master/topics/profiling
|
|
|
|
|
- https://github.com/ardanlabs/gotraining/tree/master/topics/benchmarking
|
2016-05-29 16:56:17 +08:00
|
|
|
|
- http://dave.cheney.net/2015/11/29/a-whirlwind-tour-of-gos-runtime-environment-variables
|
2016-09-20 15:20:19 +08:00
|
|
|
|
- https://github.com/davecheney/high-performance-go-workshop
|
2017-12-29 21:05:09 +08:00
|
|
|
|
- Mutex profile: https://rakyll.org/mutexprofile
|
2018-01-11 00:46:36 +08:00
|
|
|
|
- https://segment.com/blog/allocation-efficiency-in-high-performance-go-services/
|
2018-01-16 09:01:53 +08:00
|
|
|
|
- http://brendanjryan.com/2018/01/15/go-benchmarks.html
|
2018-01-16 12:44:35 +08:00
|
|
|
|
- https://lemire.me/blog/2018/01/16/microbenchmarking-calls-for-idealized-conditions/
|
2018-04-12 14:20:05 +08:00
|
|
|
|
- https://signalfx.com/blog/a-pattern-for-optimizing-go-2/
|
|
|
|
|
- https://medium.com/@hackintoshrao/daily-code-optimization-using-benchmarks-and-profiling-in-golang-gophercon-india-2016-talk-874c8b4dc3c5
|
|
|
|
|
- https://artem.krylysov.com/blog/2017/03/13/profiling-and-optimizing-go-web-applications/
|
|
|
|
|
- https://segment.com/blog/allocation-efficiency-in-high-performance-go-services/
|
|
|
|
|
- https://www.cockroachlabs.com/blog/how-to-optimize-garbage-collection-in-go/
|
|
|
|
|
- https://hashrocket.com/blog/posts/go-performance-observations
|
2019-04-03 14:39:39 +08:00
|
|
|
|
- https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
|
|
|
|
|
- https://marcellanz.com/post/file-read-challenge/
|
2019-04-03 14:45:09 +08:00
|
|
|
|
- https://boyter.org/posts/sloc-cloc-code/
|
2016-05-22 14:21:23 +08:00
|
|
|
|
|
2016-05-29 06:02:34 +08:00
|
|
|
|
cgo:
|
|
|
|
|
cgo has overhead
|
|
|
|
|
(which has only gotten more expensive over time) -- ~200 ns/call
|
2018-01-04 02:36:24 +08:00
|
|
|
|
(reduced in 1.8 to <100ns; still not free)
|
2016-05-29 06:02:34 +08:00
|
|
|
|
ssa backend means less difference in codegen
|
2017-04-24 15:05:29 +08:00
|
|
|
|
really think if you want cgo: http://dave.cheney.net/2016/01/18/cgo-is-not-go
|
2018-01-04 02:36:24 +08:00
|
|
|
|
https://www.youtube.com/watch?v=lhMhApWQp2E : cgo gophercon
|
|
|
|
|
cgo performance tracking bug: https://github.com/golang/go/issues/9704
|
|
|
|
|
|
2016-05-29 06:02:34 +08:00
|
|
|
|
|
2016-05-22 14:21:23 +08:00
|
|
|
|
videos:
|
|
|
|
|
https://gophervids.appspot.com/#tags=optimization
|
|
|
|
|
-- figure out which of these are specifically worth listing
|
|
|
|
|
|
|
|
|
|
"Profiling and Optimizng Go" (Uber)
|
|
|
|
|
https://www.youtube.com/watch?v=N3PWzBeLX2M
|
|
|
|
|
|
|
|
|
|
https://go-talks.appspot.com/github.com/davecheney/presentations/writing-high-performance-go.slide
|
|
|
|
|
https://www.youtube.com/watch?v=zWp0N9unJFc
|
|
|
|
|
|
|
|
|
|
Björn Rabenstein
|
|
|
|
|
https://docs.google.com/presentation/d/1Zu0BdbhMRar7ycEwDi8jepGokTXTDXlKFf7C13tusuI/edit
|
|
|
|
|
https://www.youtube.com/watch?v=ZuQcbqYK0BY
|
|
|
|
|
|
|
|
|
|
https://go-talks.appspot.com/github.com/mkevac/golangmoscow2016/gomeetup.slide
|
|
|
|
|
|
|
|
|
|
CppCon 2014: Chandler Carruth "Efficiency with Algorithms, Performance with Data Structures"
|
|
|
|
|
https://www.youtube.com/watch?v=fHNmRkzxHWs
|
|
|
|
|
|
|
|
|
|
Performance Engineering of Software Systems
|
|
|
|
|
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2010/
|
|
|
|
|
|
|
|
|
|
https://talks.golang.org/2013/highperf.slide#1
|
|
|
|
|
|
2016-05-23 00:18:55 +08:00
|
|
|
|
Machine Architecture: Things Your Programming Language Never Told You
|
|
|
|
|
https://www.youtube.com/watch?v=L7zSU9HI-6I
|
|
|
|
|
|
2016-09-20 15:20:19 +08:00
|
|
|
|
7 Ways to Profile Go Applications
|
|
|
|
|
https://www.youtube.com/watch?v=2h_NFBFrciI
|
|
|
|
|
|
2017-04-26 06:01:14 +08:00
|
|
|
|
dotGo 2016 - Damian Gryski - Slices: Performance through cache-friendliness
|
|
|
|
|
https://www.youtube.com/watch?v=jEG4Qyo_4Bc
|
|
|
|
|
|
2018-01-02 23:20:14 +08:00
|
|
|
|
Performance Bugs
|
|
|
|
|
https://www.youtube.com/watch?v=89qiHoDjeDg
|
|
|
|
|
|
2018-02-02 03:41:42 +08:00
|
|
|
|
The Hurricane's Butterfly: Debugging Pathologically Performing Systems
|
|
|
|
|
https://www.youtube.com/watch?v=7AO4wz6gI3Q
|
|
|
|
|
|
2018-02-04 06:25:42 +08:00
|
|
|
|
"So You Wanna Go Fast?" by Tyler Treat
|
|
|
|
|
https://www.youtube.com/watch?v=DJ4d_PZ6Gns
|
|
|
|
|
|
2018-03-16 02:27:29 +08:00
|
|
|
|
GopherCon 2017: Peter Bourgon - Evolutionary Optimization with Go
|
|
|
|
|
https://www.youtube.com/watch?v=ha8gdZ27wMo
|
|
|
|
|
|
2018-04-12 14:55:10 +08:00
|
|
|
|
CppCon 2015: Bryce Adelstein-Lelbach “Benchmarking C++ Code"
|
|
|
|
|
https://www.youtube.com/watch?v=zWxSZcpeS8Q
|
|
|
|
|
|
2019-04-03 14:39:39 +08:00
|
|
|
|
CppCon 2018: Fedor Pikus “Design for Performance”
|
|
|
|
|
https://www.youtube.com/watch?v=m25p3EtBua4
|
|
|
|
|
|
2016-05-23 20:21:18 +08:00
|
|
|
|
asm:
|
|
|
|
|
https://golang.org/doc/asm
|
|
|
|
|
https://goroutines.com/asm
|
|
|
|
|
http://www.doxsey.net/blog/go-and-assembly
|
2016-09-20 15:30:46 +08:00
|
|
|
|
https://www.youtube.com/watch?v=9jpnFmJr2PE
|
2017-04-24 15:05:29 +08:00
|
|
|
|
https://blog.gopheracademy.com/advent-2016/peachpy/
|
|
|
|
|
https://blog.sgmansfield.com/2017/04/a-foray-into-go-assembly-programming/
|
2017-04-25 11:57:47 +08:00
|
|
|
|
http://lemire.me/blog/2016/12/21/performance-overhead-when-calling-assembly-from-go/
|
2018-01-27 03:59:21 +08:00
|
|
|
|
http://davidwong.fr/goasm/
|
2018-01-04 02:36:24 +08:00
|
|
|
|
minio posts + tooling
|
2018-03-06 01:23:33 +08:00
|
|
|
|
https://github.com/teh-cmc/go-internals/blob/master/chapter1_assembly_primer/README.md
|
2018-03-25 00:58:38 +08:00
|
|
|
|
https://blog.hackercat.ninja/post/quick_intro_to_go_assembly/
|
2019-03-26 23:30:52 +08:00
|
|
|
|
https://quasilyte.dev/blog/post/go-asm-complementary-reference/
|
2016-05-23 20:21:18 +08:00
|
|
|
|
|
2016-05-22 14:21:23 +08:00
|
|
|
|
posts:
|
|
|
|
|
http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html
|
2018-01-04 02:36:24 +08:00
|
|
|
|
https://arxiv.org/abs/1509.05053 (array layouts for comparison-based searching)
|
2016-05-23 00:18:55 +08:00
|
|
|
|
http://grokbase.com/t/gg/golang-nuts/155ea0t5hf/go-nuts-after-set-gomaxprocs-different-machines-have-different-bahaviors-some-speed-up-some-slow-down
|
|
|
|
|
http://grokbase.com/t/gg/golang-nuts/14138jw64s/go-nuts-concurrent-read-write-of-different-parts-of-a-slice
|
2016-05-22 14:21:23 +08:00
|
|
|
|
|
2016-05-27 06:48:51 +08:00
|
|
|
|
Escape Analysis Flaws
|
|
|
|
|
https://docs.google.com/document/d/1CxgUBPlx9iJzkz9JWkb6tIpTe5q32QDmz8l0BouG0Cw/preview
|
|
|
|
|
|
2017-04-24 15:05:29 +08:00
|
|
|
|
https://hackernoon.com/optimizing-optimizing-some-insights-that-led-to-a-400-speedup-of-powerdns-5e1a44b58f1c
|
|
|
|
|
http://leto.net/docs/C-optimization.php
|
2016-05-22 14:21:23 +08:00
|
|
|
|
|
2018-01-05 14:18:15 +08:00
|
|
|
|
http://www.stochasticlifestyle.com/algorithm-efficiency-comes-problem-information/
|
|
|
|
|
|
2016-05-22 14:21:23 +08:00
|
|
|
|
tools:
|
|
|
|
|
https://godoc.org/github.com/aclements/go-perf
|
2017-04-25 11:57:38 +08:00
|
|
|
|
https://godoc.org/x/perf/cmd/benchstat
|
2016-05-22 14:21:23 +08:00
|
|
|
|
https://github.com/rakyll/gom
|
|
|
|
|
https://github.com/tam7t/sigprof
|
|
|
|
|
https://github.com/aybabtme/dpprof
|
|
|
|
|
https://github.com/wblakecaldwell/profiler
|
|
|
|
|
https://github.com/MiniProfiler/go
|
|
|
|
|
https://perf.wiki.kernel.org/index.php/Main_Page
|
|
|
|
|
https://github.com/dominikh/go-structlayout
|
2016-05-23 00:18:55 +08:00
|
|
|
|
http://www.brendangregg.com/perf.html
|
2017-01-14 07:49:46 +08:00
|
|
|
|
https://github.com/davecheney/gcvis
|
|
|
|
|
https://github.com/pavel-paulau/gcterm
|
2017-12-31 10:45:41 +08:00
|
|
|
|
https://github.com/jonlawlor/benchls
|
2016-05-23 00:18:55 +08:00
|
|
|
|
|
2017-12-30 00:04:50 +08:00
|
|
|
|
pprof:
|
|
|
|
|
https://rakyll.org/pprof-ui/
|
|
|
|
|
https://rakyll.org/profiler-labels/
|
|
|
|
|
https://rakyll.org/custom-profiles/
|
|
|
|
|
|
2017-04-24 15:05:29 +08:00
|
|
|
|
trace:
|
|
|
|
|
https://making.pusher.com/go-tool-trace/
|
|
|
|
|
https://www.youtube.com/watch?v=mmqDlbWk_XA
|
|
|
|
|
https://www.youtube.com/watch?v=nsM_m4hZ-bA
|
2017-12-30 00:04:39 +08:00
|
|
|
|
https://blog.gopheracademy.com/advent-2017/go-execution-tracer/
|
2017-04-24 15:05:29 +08:00
|
|
|
|
|
2016-05-23 00:18:55 +08:00
|
|
|
|
papers:
|
|
|
|
|
https://www.akkadia.org/drepper/cpumemory.pdf
|
2016-05-25 15:25:28 +08:00
|
|
|
|
https://software.intel.com/sites/default/files/article/392271/aos-to-soa-optimizations-using-iterative-closest-point-mini-app.pdf
|
2016-05-23 00:18:55 +08:00
|
|
|
|
|
2017-04-24 15:05:29 +08:00
|
|
|
|
|
|
|
|
|
optimization guides:
|
|
|
|
|
http://developer.amd.com/resources/developer-guides-manuals/
|
|
|
|
|
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.uan0015b/index.html
|
|
|
|
|
https://www-ssl.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html
|
2018-01-11 00:46:36 +08:00
|
|
|
|
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines.html#S-performance
|
|
|
|
|
https://github.com/fenbf/AwesomePerfCpp
|
2018-01-25 05:14:55 +08:00
|
|
|
|
https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.2017.11.22a.pdf
|
2017-04-24 15:05:29 +08:00
|
|
|
|
|
2016-05-23 00:18:55 +08:00
|
|
|
|
stackoverflow:
|
|
|
|
|
https://stackoverflow.com/questions/19397699/why-struct-with-padding-fields-works-faster/19397791#19397791
|
|
|
|
|
https://stackoverflow.com/questions/10017026/no-speedup-in-multithread-program/10017482#10017482
|
2017-04-24 15:05:29 +08:00
|
|
|
|
|
|
|
|
|
practice:
|
|
|
|
|
https://twitter.com/dgryski/status/584682584942194689
|
2017-12-31 10:45:41 +08:00
|
|
|
|
|
2018-01-04 02:36:24 +08:00
|
|
|
|
distributed system design: (out of scope for this book)
|
|
|
|
|
http://highscalability.com/blog/2010/12/20/netflix-use-less-chatty-protocols-in-the-cloud-plus-26-fixes.html
|
|
|
|
|
|
|
|
|
|
books:
|
2017-12-31 10:45:41 +08:00
|
|
|
|
Writing Efficient Programs
|
2018-01-03 02:19:42 +08:00
|
|
|
|
Algorithm Engineering: https://www.springer.com/gp/book/9783642148651
|
2018-01-04 02:36:24 +08:00
|
|
|
|
http://www.cs.tufts.edu/~nr/cs257/archive/don-knuth/empirical-fortran.pdf
|
|
|
|
|
|
2018-01-07 06:21:29 +08:00
|
|
|
|
Usborne: Programming Tricks and Skills
|
|
|
|
|
https://drive.google.com/file/d/0Bxv0SsvibDMTdElPMHF5NVpmU0U/view
|
|
|
|
|
|
2018-01-07 05:51:59 +08:00
|
|
|
|
Quotes: (Bumper Sticker Computer Science)
|
|
|
|
|
[The First Rule of Program Optimization] Don't do it.
|
|
|
|
|
[The Second Rule of Program Optimization---For experts only] Don't do it yet.
|
|
|
|
|
Michael Jackson
|
|
|
|
|
Michael Jackson Systems Ltd.
|
|
|
|
|
|
2018-03-25 11:25:28 +08:00
|
|
|
|
The key to performance is elegance, not battalions of special cases.
|
|
|
|
|
— Jon Bentley and Doug McIlroy
|
|
|
|
|
|
|
|
|
|
You’re bound to be unhappy if you optimize everything.
|
|
|
|
|
— Donald Knuth
|