add summaries of some blog posts, cgo

2016-05-29 00:02:34 +02:00 · 2016-05-29 00:02:34 +02:00 · e815ef4399
commit e815ef4399
parent bc56c47bce
1 changed files with 50 additions and 2 deletions
--- a/52
+++ b/52
@ -1,11 +1,53 @@
 * blog posts
 	- http://jmoiron.net/blog/go-performance-tales/
-	- http://blog.golang.org/profiling-go-programs
+                - use integer map keys if possible
                - hard to compete with Go's map implementation; esp. if your data structure has lots of pointer chasing
                - aes-ni instructions make string hashing much faster
                - prefer structs to maps if you know the map keys (esp. coming from perl, etc)
                - channels are useful, but slow; raw atomics can help with performance
                - cgo has overhead
                - profile before optimizing
 	- http://slideshare.net/cloudflare/go-profiling-john-graham-cumming ( https://www.youtu.be/_41bkNr7eik )
-	- https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs
+            - don't waste programmer cycles saving the wrong CPU cycles (or memory allocations)
            - bash$ time; time.Now()/time.Since(); pprof.StartCPUProfile/pprof.StopCPUProfile; go tool pprof http://.../profile
            - bash$ ps; runtime.ReadMemStats(); runtime.WriteHeapProfile(); go tool pprof http://.../heap
            - slice operations are sometimes O(n)
            - https://golang.org/pkg/runtime/debug/
            - sync.Pool (basically)
 	- https://methane.github.io/2015/02/reduce-allocation-in-go-code
            - 1. correctness is important
            - 2. BenchmarkXXX with b.ReportAllocs() (or -benchmem when running)
            - 3. allocfreetrace=1 produces stack trace on every allocation
            - strategies:
                - avoid string concat; use []byte+append() (+strconv.AppendInt(), ...)
                - benchcmp
                - avoid time.Format
                - avoid range when iterating strings ([]rune conversion + utf8 decoding)
                - can append string to []byte
                - write two versions, one for string, one for []byte (avoids conversion+copy (sometimes...))
                - reuse existing buffers instead of creating new ones
 	- http://bravenewgeek.com/so-you-wanna-go-fast/
            - performance fast vs. delivery fast; make the right decision
            - lock-free ring buffer vs. channels: faster except with GOMAXPROCS=1
            - defer has a cost (allocation+cpu)
                BenchmarkMutexDeferUnlock-8 20000000 96.6 ns/op
                BenchmarkMutexUnlock-8 100000000 19.5 ns/op
            - reflection+json
                - ffjson avoids reflection
                - msgp avoids json
                - interfaces have dynamic dispatch which can't be inlined
                - => use concrete types (+ code duplication)
            - heap vs. stack; escape analysis
            - lots of short-lived objects is expensive for the gc
            - sync.Pool reuses objects *between* gc runs
            - you need your own free list to hold onto things between gc runs
                (but now you're subverting the purpose of a garbage collector)
            - false sharing
            - custom lock-free data structures: fast but *hard*
            - "Speed comes at the cost of simplicity, at the cost of development time, and at the cost of continued maintenance. Choose wisely."
 	- https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs
 	- http://blog.golang.org/profiling-go-programs
 	- https://medium.com/%40hackintoshrao/daily-code-optimization-using-benchmarks-and-profiling-in-golang-gophercon-india-2016-talk-874c8b4dc3c5
 	- If you're writing benchmarks, read http://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go
  - cache line explanation: http://mechanitis.blogspot.com/2011/07/dissecting-disruptor-why-its-so-fast_22.html
@ -15,6 +57,12 @@
  - https://github.com/ardanlabs/gotraining/tree/master/topics/profiling
  - https://github.com/ardanlabs/gotraining/tree/master/topics/benchmarking
 cgo:
    cgo has overhead
        (which has only gotten more expensive over time) -- ~200 ns/call
    ssa backend means less difference in codegen
    really thing if you want cgo: http://dave.cheney.net/2016/01/18/cgo-is-not-go
 videos:
    https://gophervids.appspot.com/#tags=optimization
        -- figure out which of these are specifically worth listing