2014-09-08

Teaser: Go on Plan 9

At long last, Glenda and the Go Gopher finally are together:



In due time, I'll publish more information about how to get Go up and running on Plan 9. It was a real pain-in-the-rear on VirtualBox due to an interaction problem with the network card driver and the hget tool.

follow us in feedly

2014-09-06

Go Testing: Easy Polish for World-Class Tests

If you're like me, when you first started out with Go, you may have felt a temptation to use a testing framework. I sure did. Beginner's mistake. A huge one, in fact. I started out using gocheck. To be sure, I don't mention any of this to denigrate gocheck's authors; rather, I just point this out as a matter of preference as my familiarity with the ecosystem grew. This is to say, once I became more familiar with Go and its idioms, the need for these withered. The standard library combined with the idioms just delivered.


So why bring this up? Every day, we face the question about how we want to architect our tests and reason with their output. Go, with its anonymous (unnamed) types and struct literals naturally gravitates itself toward simplicity and conciseness. Table-driven tests are a natural consequence—maybe an end of history moment for the ecosystem. Anyway, I digress.


What I want to focus on in this post is not strictly table-driven tests but rather a few details that make tests in Go easy to reason with, pain free, and maintainable. This will be a chatty post, but I want you to bear with me and experience my anecdotes first hand. For background, I encourage you to read Dave Cheney's venerable piece on table-driven tests if you are unfamiliar with table tests. What I will not do in this post (perhaps in another) is discuss how to design testable code or build correct tests for your systems. My motivations are selfish: the easier it is to test and test somewhat correctly, the more it will be done. Our ecosystem will flourish with high-quality, and our peers can hold one another to high standards. Let's get started before I fall off the ivory tower!


Value Comparisons and Reflection

I come to the table with a lot of Java experience under my belt. With that, comes a strong revulsion toward reflection, because it makes reasoning with a system's design inordinately difficult. Worse, when used incorrectly, it introduces terrible faults by bypassing the compiler's type safety guarantees. You could call this baggage, if you will. When I first saw pkg/reflect, I nearly vomited in my mouth of horror should I ever need to use this thing. Thusly I avoided it—much to my detriment, as I will try to convince you.


Custom Assertions / Testing Framework


In the course of writing tests, I went through several iterations. As I mentioned above when starting out with Go, I wrote custom assertion mechanisms using gocheck. That stopped after needing to perform a couple refactorings and discovered that the maintenance cost to keep the framework-derived tests up-to-date exceeded the cost of the base refactor. What followed?


Iteratively Checking Components of Emitted Output


Iteratively checking components of emitted output—and let me tell you how not pretty that is. Suppose we have a function that consumes []int and emits []int with each slice element's value incremented by one:


// Increment consumes a slice of integers and returns a new slice that contains
// a copy of the original values but each value having been respectively
// incremented by one.
func Increment(in []int) []int {
 if in == nil {
  return nil
 }
 out := make([]int, len(in))
 for i, v := range in {
  out[i] = v + 1
 }
 return out
}
(Source)

How would you test this under the iterative approach? It might look something like this with a table-driven test:


func TestIterative(t *testing.T) {
 for i, test := range []struct {
  in, out []int
 }{
  {},
  {in: []int{1}, out: []int{2}},
  {in: []int{1, 2}, out: []int{2, 3}},
 } {
  out := Increment(test.in)
  if lactual, lexpected := len(out), len(test.out); lactual != lexpected {
   t.Fatalf("%d. got unexpected length %d instead of %d", i, lactual, lexpected)
  }
  for actualIdx, actualVal := range out {
   if expectedVal := test.out[actualIdx]; expectedVal != actualVal {
    t.Fatalf("%d.%d got unexpected value %d instead of %d", i, actualIdx, actualVal, expectedVal)
   }
  }
 }
}
(Source)

What do you notice here aside from how terrible it is? I can't really recall how I thought of this incarnation was ever a good idea, but I suspect it was on a sleepless night. Anyway, if it doesn't appear terrible, let's make a quick inventory of what's deficient:

  • That's a lot of boilerplate to write.
  • The boilerplate is fragile.
  • When the test does fail, it is damn hard to find out exactly which table row failed: All we get are the two index variables at the beginning of the format string. Pray for what happens if the test's table exceeds one page in length! Double pray it isn't late at night, when you'd go cross-eyed staring at the output.

It turns out that there is more wrong with this than what I enumerated. Don't fear. We'll get to that soon. (Our goal is to make the tests so tip-top that Gunnery Sgt. Hartman would smile.)


Hand-Written Equality Test

A later thought was, why not create an equality helper or a custom type for []int? Let's try that out and see how well that goes:

// Hold your horses, and ignore sort.IntSlice for a moment.
type IntSlice []int

func (s IntSlice) Equal(o IntSlice) bool {
 if len(s) != len(o) {
  return false
 }
 for i, v := range s {
  if other := o[i]; other != v {
   return false
  }
 }
 return true
}
(Source)

…, which is then used as follows:

func TestIntSliceIterative(t *testing.T) {
 for i, test := range []struct {
  in, out []int
 }{
  {},
  {in: []int{1}, out: []int{2}},
  {in: []int{1, 2}, out: []int{2, 3}},
 } {
  if out, testOut := IntSlice(Increment(test.in)), IntSlice(test.out); !testOut.Equal(out) {
   t.Fatalf("%d. got unexpected value %s instead of %s", i, out, test.out)
  }
 }
}
(Source)

You're probably asking, "what's wrong with that? Looks reasonable." Sure, this works, but …

  • We've created and exported a new method receiver for a new type. Was this really necessary for users? Would a reasonable user need to use IntSlice.Equal, ever? If you look at the generated documentation, it is an extra item in the inventory, thusly creating further cognitive burden if the method is not really useful outside of the test. We can do better than this.
  • All of the fragility and error-prone remarks from the previous case still apply. We've just shifted the maintenance to dedicated function to perform the work.

"OK, but couldn't you have just made IntSlice.Equal unexported with IntSlice.equal," the peanut gallery protests? Yes, but that still does not represent an optimal solution when compared with what follows.


Using pkg/reflect

So, where am I going with this? pkg/reflect offers this helpful facility known as reflect.DeepEqual. Take a few minutes to read its docstring carefully. I'll wait for you. The takeaway is that overwhelmingly, reflect.DeepEqual does the right thing for you for most correct public API design styles:


  • Primitive types: string, integers, floating point values, and booleans.
  • Complex types: maps, slices, and arrays.
  • Composite types: structs.
  • Pointers: The underlying values of the pointer and struct fields.
  • Recursive values: reflect.DeepEqual memos what it has visited!

Let's take what we've learned and apply it to the test:

import (
  "testing"
  "reflect"
)

func TestReflect(t *testing.T) {
 for i, test := range []struct {
  in, out []int
 }{
  {},
  {in: []int{1}, out: []int{2}},
  {in: []int{1, 2}, out: []int{2, 3}},
 } {
  if out := Increment(test.in); !reflect.DeepEqual(test.out, out) {
   t.Fatalf("%d. got unexpected value %#v instead of %#v", i, out, test.out)
  }
 }
}
(Source)

Boom! You can delete type IntSlice and IntSlice.Equal—provided there is no actual user need for them. Remember: It is usually easier to expand an API later versus taking something away.


One bit of advice: pkg/reflect enables you to apply test-driven development for most APIs immediately when combined with table-driven tests. This is a great opportunity to validate the assumption that reflect.DeepEqual actually works correctly for the expected-versus-actual test. There is little worse than over-reliance on something that yields a false sense of confidence. The onus is on you to know your tools.


Cases When Not to Use reflect.DeepEqual

Surely there's a downside? Yep, there are; nothing good comes without caveats:

  • The type or package already exposes an equality test mechanism. This could be from code that you import and use versus author yourself. A notable example you should be aware of is the goprotobuf library's proto.Equal facility to compare two messages. Usually there is a good reason. Defer to the author's judgement.
  • The comparison of actual versus expected involves a type or composition that is incompatible with reflect.DeepEqual. Channels are an obvious example, unless you are expecting a nil channel on both sides!
  • The type that is being compared has transient state. Transient state may manifest itself in unexported fields. This raises the question of whether the transient state is important. For instance, it could exist for memoization, like of a hash for an immutable type that is lazily generated.
  • You are functionally using a nil slice as an empty slice in your code: reflect.DeepEqual([]int{}, []int(nil)) == false.

Needless to say, if any of the previous apply, exercise extreme caution.


Ordering of Test Local Values: Actual and Expected

It turns out that we aren't done yet. (If you thought we were, you'd end up as happy as Pvt. Gomer Pile during footlocker inspection). Go has a convention with modern tests to place actual before expected. (I highly encourage everybody to visit that link and study and practice its content!) Let's clean up our mess from above:


func TestReflectReordered(t *testing.T) {
 for i, test := range []struct {
  in, out []int
 }{
  {},
  {in: []int{1}, out: []int{2}},
  {in: []int{1, 2}, out: []int{2, 3}},
 } {
  if out := Increment(test.in); !reflect.DeepEqual(out, test.out) {
   t.Fatalf("%d. got unexpected value %s instead of %s", i, out, test.out)
  }
 }
}
(Source)

Why call this to attention? The convention exists; and when it is followed, the faster it is for a non-maintainer to reason with somebody else's code.


Naming Test Local Variables

The Code Review Comments Guide outlines some interesting ideas that ought to be adopted as convention (note that each subsequent bullet point builds on the previous):


  • Input should be named in. For instance, if we were testing a string length function, each test table row's signature could be struct { in string, len int }. Admittedly this is easiest to achieve when the tests' input is unary.


    When it is not unary, sometimes grouping inputs in the table definition to a struct named in suffices. Suppose that we are building a table test for a quadratic function: struct { in struct { a, b, c, x int }, y int }.

  • Expected output should be named want. Our string length example becomes struct { in string, want int }; whereas, the quadratic becomes struct { in struct { a, b, c, x int }, want int }.


    If the tested component's type signature has a multiple value result, you could take an approach similar to the multiple arity input case and group the output into a struct named want. A table test row for an implementation of a io.Reader could look like struct { in []byte, want struct { n int, err error } }.

  • The actual value (i.e., the side effect) being tested should be named got.

What does our example above look like after these rules are applied?

func TestReflectRenamed(t *testing.T) {
 for i, test := range []struct {
  in, want []int
 }{
  {},
  {in: []int{1}, want: []int{2}},
  {in: []int{1, 2}, want: []int{2, 3}},
 } {
  if got := Increment(test.in); !reflect.DeepEqual(got, test.want) {
   t.Fatalf("%d. got unexpected value %s instead of %s", i, got, test.want)
  }
 }
}
(Source)

Formatting Error Messages

How you format your tests' error messages is an important but oft-neglected topic, one that has practical benefit. Why is that?

  • Your failure messages indicate where an anomaly has occurred and why. Think about this for a moment. In the table-tests above, where is conveyed in the initial indices in the print format string. Why is conveyed through the remark of actual versus expected.
  • Your failure messages have an inherent time-to-decode cost for the user. The longer it takes, the more difficult maintenance, refactorings, and reiterations become. It should take no more than two seconds for a non-maintainer reading the failure message to know on what input the test failed! This needn't mean the external parties understand why.

If your test failure messages do not fulfill the points above, they have failed the human requirements! For sake of demonstration, the test failure messages above in this post intentionally fail these criteria!


Format Expressions


Let's take a quick diversion down format string lane… What happens if your test fails above for input of type x and the message is emitted to the console? Would you be able to figure out which table test row is responsible for the failure quickly?


The answer to this depends on the behavior of the type that backs in, want, and got. Does the type formally implement fmt.Stringer? What is the format expression?


If you are lazy and just rely on the default fmt.Stringer behavior and use %s, you may get some results that are hard to read. Consider this example below:


package main

import "fmt"

type Record struct {
 GivenNames []string
 FamilyName string
 Age        int
 Quote      string
}

func main() {
 rec := Record{[]string{"Donald", "Henry"}, "Rumsfeld", 82, `… there are known knowns; there are things that we know that we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know.`}

 fmt.Printf("%%s %s\n", rec)
 fmt.Printf("%%v %v\n", rec)
 fmt.Printf("%%#v %#v\n", rec)
}
(Source)

emits


%s {[Donald Henry] Rumsfeld %!s(int=82) … there are known knowns; there are things that we know that we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know.}
%v {[Donald Henry] Rumsfeld 82 … there are known knowns; there are things that we know that we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know.}
%#v main.Record{GivenNames:[]string{"Donald", "Henry"}, FamilyName:"Rumsfeld", Age:82, Quote:"… there are known knowns; there are things that we know that we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know."}

Compare these emissions for a moment. %s doesn't perform so well. Things can get even worse; suppose Record implements fmt.Stringer and the result is too verbose or convoluted to differentiate table rows?


func (r Record) String() string {
 return fmt.Sprintf("[Record: %s %s]", r.GivenNames[0], r.FamilyName)
}

Note how that fmt.Stringer omits a bunch of fields. Suppose we have multiple table records of Donald Rumsfeld with minute differences. We'd be a one very sad Pvt. Gomer Pile if any test failed.


My advice: stick to using %#v for printing out in, want, and got. You can easily differentiate the output and hopefully find the record in the test table quickly. This also prevents %s and fmt.Stringer from tripping you up if the code comes from a third-party! It is worth the effort.


Content of the Test Error Message

If you are still reading, thank you for bearing through this long post. You'll come out ahead writing better tests. We're now on the final topic: how to make the test error messages useful.


For consistency, prefer using a format like this for pure or semi-pure tests that exercise a function:


t.Errorf("YourFunction(%#v) = %#v; want %#v", in, got, want)

The output is concise and obvious. With clear ways of differentiating between test cases, there is no need to keep that stupid index variable in the format string. Let's now take that test we've been polishing and show what the final output should look like:

func TestReflectPristine(t *testing.T) {
 for _, test := range []struct {
  in, want []int
 }{
  {},
  {in: []int{1}, want: []int{2}},
  {in: []int{1, 2}, want: []int{2, 3}},
 } {
  if got := Increment(test.in); !reflect.DeepEqual(got, test.want) {
   t.Fatalf("Increment(%#v) = %#v; want %#v", test.in, got, test.want)
  }
 }
}
(Source)

With luck, your tests will be easy to decipher and you won't find yourself in a world of shit.


In closing, I put together this blog post largely as form of penance for my mistakes while learning Go and with the side effect that my learner's bad habits caught onto the people I was working with. Patterns are contagious—just like misinformation. Here's to hoping that this makes up for it:

Hail the Right and Just, Cmdr. Pike,
By whose work
Unmaintainable code is defeated, for practicality
Has now overflowed upon all of the world.

In the next posting, I will discuss some more testing patterns and focus less on style. Until then, I bid you good testing!

follow us in feedly

2014-09-02

Go Testing Patterns: Benchmarking and Expensive Fixture Setup, Memory Allocations, and Automation

Dave Cheney put together a nice posting that outlines how to use Go's pkg/testing facility for microbenchmarking This post should be considered an addendum, for it extends his with other useful tidbits.


Expensive Fixture Setup

In Dave's example, he instruments the Fibonacci sequence, a pure benchmark that requires almost zero setup. Sometimes the real world is not as forgiving in terms of purity. Suppose we want to benchmark something that requires setup. How do we achieve that in the context of pkg/testing? For the sake of argument, we will come up with a contrived example: traversing a binary tree.


What I have done is created an example project on Github. I encourage you to start reading its generated GoDoc to understand what it is and how it works before we proceed.


Let's look at a naive example of what a benchmark could be:

func benchmarkJ(b *testing.B, j int, t Traverser) {
 // Create our system under test (SUT):
 tree := New(j)
 // We create a bunch of recipient channels in advance for our traverser to
 // write into.
 chs := make(chan chan int, b.N)
 for i := 0; i < b.N; i++ {
  chs <- make(chan int, j)
 }
 close(chs)

 // Run our benchmark.
 for ch := range chs { // len(chs) == b.N
  t(tree, ch)
 }
}
(Source)

How does that look to you? That all gravy? Sadly it isn't. What's wrong with it? To start, the benchmark is instrumenting things that are not meant to be instrumented, like the setup cost:

  • tree := New(J) is producing allocations and takes time.
  • for i := 0; i < b.N; i++ { // make(chan int) } is producing allocations and takes time.

You see, pkg/testing's benchmark framework works-out-the-box somewhat naively in that the moment the benchmarking function is called, it begins its instrumentation. What happens if you fall into a boat like this, where we need setup and the setup to be isolated to promote test determinism (as well as not have the setup count in the instrumented run)? Luckily pkg/testing provides the B.StartTimer and B.StopTimer facilities. These enable you to remark to the benchmark framework that what happens between the interval of these calls should not be instrumented (more precisely: only begin the benchmark after the terminal B.StartTimer invocation)! Let's see what this looks like in action!

func benchmarkJCorrect(b *testing.B, j int, t Traverser) {
 // IMPORTANT: Instruct the benchmarker to not track time and memory allocations!
 b.StopTimer()

 // Create our system under test (SUT):
 tree := New(j)
 // We create a bunch of recipient channels in advance for our traverser to
 // write into.
 chs := make(chan chan int, b.N)
 for i := 0; i < b.N; i++ {
  chs <- make(chan int, j)
 }
 close(chs)

 // IMPORTANT: Instruct the benchmarker to begin tracking time and memory
 // allocations.  Everything after this mark is instrumented!
 b.StartTimer()

 // Run our benchmark.
 for ch := range chs { // len(chs) == b.N
  t(tree, ch)
 }
}
(Source)

"Great," you say. Just large of a difference did this make? Here's a raw reading of it:

// Before
PASS
BenchmarkRecursive1     10000000               267 ns/op
BenchmarkRecursive10     5000000               590 ns/op
BenchmarkRecursive100     500000              4315 ns/op
BenchmarkRecursive1000     50000             40746 ns/op
BenchmarkIterative1      5000000               308 ns/op
BenchmarkIterative10     2000000               850 ns/op
BenchmarkIterative100     500000              5275 ns/op
BenchmarkIterative1000     50000             47494 ns/op

// After
PASS
BenchmarkRecursive1     20000000               104 ns/op
BenchmarkRecursive10     5000000               414 ns/op
BenchmarkRecursive100     500000              3828 ns/op
BenchmarkRecursive1000     50000             38120 ns/op
BenchmarkIterative1     10000000               182 ns/op
BenchmarkIterative10     5000000               688 ns/op
BenchmarkIterative100     500000              4888 ns/op
BenchmarkIterative1000     50000             42569 ns/op

"Wait, didn't you say something about memory allocations and one being more efficient than the other?" Why, yes! You'll need to stick with us to the next section to find out more!

Benchmarking Memory Allocations

If I recall correctly Go 1.1 or 1.2 publicly exposed a new feature in pkg/testing for the benchmarking framework: it is the ability to instrument memory allocation. There is quite a bit of debate around the topic of memory management in Go. While the Go team is working hard at improving the garbage collector (that is another topic altogether), there is the question about what happens under the hood, and this pertains to the topic of the compiler as well as the memory manager itself. Once my writeup of the memory manager and garbage collector are published, I will return to this. Anyway, the Go Team would prefer that you not worry about the question of stack or heap. (My professional opinion: It makes a huge difference for writing performance critical and scalable software, but keep in mind that it gets into the realm of implementation detail!)

How do we begin? In the go test documentation under the flag description area, we find a flag called -benchmem. Once turned on, the benchmarker behaves almost identically, except for we get statistics about memory:

// Before
PASS
BenchmarkRecursive1     10000000               284 ns/op             105 B/op          1 allocs/op
BenchmarkRecursive10     5000000               598 ns/op             186 B/op          1 allocs/op
BenchmarkRecursive100     500000              4603 ns/op             912 B/op          1 allocs/op
BenchmarkRecursive1000     50000             43971 ns/op            8281 B/op          2 allocs/op
BenchmarkIterative1      5000000               613 ns/op             115 B/op          2 allocs/op
BenchmarkIterative10     1000000              1730 ns/op             245 B/op          4 allocs/op
BenchmarkIterative100      50000             28248 ns/op            1426 B/op          7 allocs/op
BenchmarkIterative1000      5000            211706 ns/op           12418 B/op         11 allocs/op

// After
PASS
BenchmarkRecursive1     20000000                98.8 ns/op             0 B/op          0 allocs/op
BenchmarkRecursive10     5000000               438 ns/op               0 B/op          0 allocs/op
BenchmarkRecursive100     500000              3819 ns/op               0 B/op          0 allocs/op
BenchmarkRecursive1000     50000             37911 ns/op               0 B/op          0 allocs/op
BenchmarkIterative1     10000000               187 ns/op               9 B/op          1 allocs/op
BenchmarkIterative10     5000000               805 ns/op              59 B/op          3 allocs/op
BenchmarkIterative100     500000              5206 ns/op             513 B/op          6 allocs/op
BenchmarkIterative1000     50000             53608 ns/op            4130 B/op          9 allocs/op

Cool, what do I make of this? That depends on how meaningful the data is to you, the type of component you are instrumenting, whether it is in serving path, exported library code, etc. The crux of is that you can rely on trickery around how your types and API are designed and escape analysis to affect the efficiencies around allocations. The Go GC compiler offers a -gcflags mode that exposes the internal escape analysis. To enable this, adjust your flags as follows: -gcflags='-m'.

The hope of disabling these optimizations is that you can more carefully craft your benchmarks and not fight against escape analysis, whose sole point is to optimize away common usage patterns. For instance, disabling optimizations should prevent you from needing to force an object to escape to ensure that it is, in fact, heap allocated:

// Rest elided.

// This is completely contrived to illustrate how silly manually defeating the
// escape analysis/inliner is!

type BigBeefyMiracle struct {
        You, Can, Never, Have, Too, Many, Arrays [16][16][16]byte
        Or, Beefy, Miracles                      [32][32][32]byte
}

var globalBeefiness *BigBeefyMiracle // Global scope.

func BenchmarkHisBeefiness(b *testing.B) {
        var localBeefiness BigBeefyMiracle
        globalBeefiness = &localBeefiness

        // The API also offers a concise B.ResetTimer(), which implies both
        //B.StopTimer() and B.StartTimer().
        b.ResetTimer()

        for i := 0; i < b.N; i++ {
                // MakeMeatSlurry is the system under test.
                MakeMeatSlurry(globalBeefiness)
        }
}
(Struct Name Etymology)

OK, OK. Now, can I automate any of this? Sure. Read the next section to find out more!

Automating the A/B Testing Readouts

Suppose you have done a run of my suite and swapped the benchmark mechanism. How do you automate the comparison of the data?

It turns out that Go includes an awesome unadvertised utility called benchcmp. It can be installed like this:

$ go install code.google.com/p/go.tools/cmd/benchcmp

It assumes that you've piped the benchmark output to two buffers that you provide as positional arguments:

$ benchcmp <before> <after>

Let's give it a whirl with our memory readout (it supports non-memory readouts, too!):

$ benchcmp before after
enchmark                  old ns/op     new ns/op     delta
BenchmarkRecursive1        284           98.8          -65.21%
BenchmarkRecursive10       598           438           -26.76%
BenchmarkRecursive100      4603          3819          -17.03%
BenchmarkRecursive1000     43971         37911         -13.78%
BenchmarkIterative1        613           187           -69.49%
BenchmarkIterative10       1730          805           -53.47%
BenchmarkIterative100      28248         5206          -81.57%
BenchmarkIterative1000     211706        53608         -74.68%

benchmark                  old allocs     new allocs     delta
BenchmarkRecursive1        1              0              -100.00%
BenchmarkRecursive10       1              0              -100.00%
BenchmarkRecursive100      1              0              -100.00%
BenchmarkRecursive1000     2              0              -100.00%
BenchmarkIterative1        2              1              -50.00%
BenchmarkIterative10       4              3              -25.00%
BenchmarkIterative100      7              6              -14.29%
BenchmarkIterative1000     11             9              -18.18%

benchmark                  old bytes     new bytes     delta
BenchmarkRecursive1        105           0             -100.00%
BenchmarkRecursive10       186           0             -100.00%
BenchmarkRecursive100      912           0             -100.00%
BenchmarkRecursive1000     8281          0             -100.00%
BenchmarkIterative1        115           9             -92.17%
BenchmarkIterative10       245           59            -75.92%
BenchmarkIterative100      1426          513           -64.03%
BenchmarkIterative1000     12418         4130          -66.74%

Nifty, eh? Doesn't rearchitecting the benchmark properly look so much more compelling now?

Bypassing the Optimizer

You may not have been comfortable with my remarks above about the implicit optimizations in the compiler and the runtime and what that means for the code you write. I understand. One of the things you can do to help keep things ceteris paribus is to disable compiler optimizations. It isn't foolproof for comparison, since your production code will likely be built with optimization in mind. Nevertheless, it is still helpful for finding a baseline. Again, this falls into the -gcflags realm:

$ go test -gcflags='-N' <pathspec>  # Other arguments elided.

In Closing …

In closing, these are some useful tidbits to keep in your toolbelt. Don't forget that there's usually a clean way of doing everything in Go, so don't reinvent the wheel for yourself if you can avoid it.

Finally, do keep in mind that microbenchmarking (Link is regarding Java, but you're mature enough to get over this and love mature runtimes) is not an end-all-be-all. It is useful for guidelines and advisement, nothing more. Focus on correct design (internal and external) and code comprehension/readability first.

In our next installments, we will focus on the following:

  • correctly benchmarking throughput in parallelized operations,
  • choosing an optimal concurrency level from benchmarking, and
  • many other useful testing patterns.

Stay tuned!

follow us in feedly

2014-08-31

Go Testing Patterns: Common Rendezvous Point for Concurrency

For as much as I try to design interfaces that avoid direct exposure of concurrency to the user, I find myself periodically testing said code (usually non-exported implementation). I'd like to outline a couple of practices that have worked for me:


Common Rendezvous Point

What the common rendezvous does is ensure that all participants have reached a common place in their execution and wait there until being told to resume. Suppose we a system under test (SUT) that involves Routine A, Routine B, and Routine C, and the SUT expects them all to be at a common point P between each of them before continuing to the core activity that produces the side effects we want to check.


(Larger Version)

How we we best achieve this in Go? The pkg/sync provides a great place to start, specifically the WaitGroup type, which provides a barrier similar to the Java Standard Library's CountDownLatch. Let's get started by modeling the SUT and the preparing components for the routines!


package rendezvous

import (
        "fmt"
        "sync"
        "testing"
)

func TestSystem(t *testing.T) {
        var prep sync.WaitGroup // The preparation rendezvous point.
        prep.Add(3)         // We have three participants.

        go func() {
                fmt.Println("Routine A starting ...")
                fmt.Println("Routine A preparing ...")
                // Do expensive preparation.
                fmt.Println("Routine A finished preparation.")
                prep.Done() // Signal completion.
                fmt.Println("Routine A received signal to proceed.")
                // TODO: Flesh out the rest.
        }()
        go func() {
                fmt.Println("Routine B starting ...")
                fmt.Println("Routine B preparing ...")
                // Do expensive preparation.
                fmt.Println("Routine B finished preparation.")
                prep.Done() // Signal completion.
                fmt.Println("Routine B received signal to proceed.")
                // TODO: Flesh out the rest.
        }()
        go func() {
                fmt.Println("Routine C starting ...")
                fmt.Println("Routine C preparing ...")
                // Do expensive preparation.
                fmt.Println("Routine C finished preparation.")
                prep.Done() // Signal completion.
                fmt.Println("Routine C received signal to proceed.")
                // TODO: Flesh out the rest.
        }()

        select {}  // Wait infinitely.  This aborts the runtime.  See later steps.
}
(On Go Playground)

It's time to reflect on what's going on under the hood. Let's start with some obvious questions.

Why did we use a WaitGroup as opposed to a chan? We could have created three separate channels that all routines shared but only one closed similar to this fragment:

// Outer scope
routA := make(chan struct{})
routB := make(chan struct{})
routC := make(chan struct{})

// In Routine A
close(routA)
<-routB
<-routC

// In Routine B
close(routB)
<-routa
<-routC

// In Routine C
close(routC)
<-routa
<-routB

Sure, this fragment works, but look at how needlessly verbose and fragile it is. One misimplemented handler, and boom: the whole design fails—sometimes silently!

Why do we declare the prep WaitGroup as var prep sync.WaitGroup as opposed to simply prep := sync.WaitGroup{}? We are using the zero value for the underlying struct, which is readily usable for us with this type (this topic is worthy a blog post of its own). Using the VarDecl declaration style, we can chain subsequent VarSpec in the same definition for the same type. Suppose we add a second WaitGroup; which of the following is cleaner and easier to read first, second := sync.WaitGroup{}, sync.WaitGroup{} or var first, second sync.WaitGroup? We'll return to this later.

Let's continue.

package rendezvous

import (
        "fmt"
        "sync"
        "testing"
)

func TestSystem(t *testing.T) {
        var prep, fin sync.WaitGroup // The preparation rendezvous point.
        prep.Add(3)                  // We have three participants.
        fin.Add(3)                   // We have three participants.

        go func() {
                fmt.Println("Routine A starting ...")
                fmt.Println("Routine A preparing ...")
                // Do expensive preparation.
                fmt.Println("Routine A finished preparation.")
                prep.Done() // Signal completion.
                fmt.Println("Routine A received signal to proceed.")
                // Perform real work here.
                fin.Done()
                fmt.Println("Routine A exited.")
        }()
        go func() {
                fmt.Println("Routine B starting ...")
                fmt.Println("Routine B preparing ...")
                // Do expensive preparation.
                fmt.Println("Routine B finished preparation.")
                prep.Done() // Signal completion.
                fmt.Println("Routine B received signal to proceed.")
                // Perform real work here.
                fin.Done()
                fmt.Println("Routine B exited.")
        }()
        go func() {
                fmt.Println("Routine C starting ...")
                fmt.Println("Routine C preparing ...")
                // Do expensive preparation.
                fmt.Println("Routine C finished preparation.")
                prep.Done() // Signal completion.
                fmt.Println("Routine C received signal to proceed.")
                // Perform real work here.
                fin.Done()
                fmt.Println("Routine C exited.")
        }()

        fin.Wait()
        // Validate side effects.
        fmt.Println("All routines exited.")
}
(On Go Playground)

Recall the point I made above about adding a second WaitGroup? Here it is in fin, which each goroutine uses to signal its completion of its respective work. We also replaced select {} with fin.Wait().

So what does this effectively buy us? Well, if we need to be sure for purposes of testing that all participants have reached one or more common rendezvous points before either running the SUT or validating side effects, we have a graceful protocol. This ensures both correctness and determinism!

Is the example missing anything? That depends on the SUT and its needs. One candidate could be cancellation on failure or timeout. Implementing cancellation correctly depends strongly on the underlying SUT, as I implied. There is no one-size-fits all cancellation policy for every API in Go, so I leave that as an exercise to the reader. If you're interested, this article discusses how the Java API handles this problem on a thread-level. Very generally, we could create a timeout policy as such:

import "time"

// rest elided

finished := make(chan struct{})

go func(sig <- chan struct{}) {
  fin.Wait()
  close(sig)
}(finished)

select {
  case <- finished:
  break
  case time.After(5 * time.Second):
    t.Fatal("SUT timed out.")
}

Do note that this does not include proper cancellation of your routines! Creating a timeout policy as such is not strictly needed, as the standard Go pkg/testing includes a configurable timeout for the entire test run.

Also note that this posting does not seek to enable overly complicated API design nor Rube Goldberg Machine-style tests. Only with experience and wisdom will you be able to discern between knowing when it is necessary.

In the next segment, we'll talk about concurrency level in tests.

follow us in feedly

2014-08-24

Plan 9's Acid Debugger and Go

Background

I've been putting together some research on the Go Language's runtime to help folks better understand its memory manager and what impact it has on the servers they develop.  In the course of this, I found that Gdb was of limited use for working with the low-level runtime (I'll post a follow-up about Gdb in a few days):


It dawned on me that perhaps Plan 9's debugger called Acid may be the ticket home.  This supposition arose due to direct integration of Acid support in the golang cc compiler as well as perennial mentions of Acid on golang-nuts.  Here are the results from my trial, in case anyone is interested in running with this.

The Trial

$ pacman -S plan9port

$ make entrypoint

$ 9 acid entrypoint
entrypoint: linux amd64 executable
/usr/lib/plan9/acid/port
/usr/lib/plan9/acid/amd64

That'll get Acid loaded.  Let's see what's in the symbol table that contains the substring "Init".

acid; symbols("Init")
runtime.MCentral_Init T 0x00404ce0 entrypoint runtime.MCentral_Init
runtime.FixAlloc_Init T 0x00405cc0 entrypoint runtime.FixAlloc_Init
runtime.MHeap_Init T 0x0040af50 entrypoint runtime.MHeap_Init
runtime.MSpan_Init T 0x0040c520 entrypoint runtime.MSpan_Init
runtime.MSpanList_Init T 0x0040c5a0 entrypoint runtime.MSpanList_Init
runtime.InitSizes T 0x0040cfe0 entrypoint runtime.InitSizes

Cool.  At this point, what I start learning that is awesome about Acid is how extensible it is.  To wit, we can program functions in a native language as well as ask Acid to describe available functions:

acid; whatis source
defn source() {
 local l;
 l = srcpath;
 while l do
 {
  print(head l,"\n");
  l = tail l;
 }
 l = srcfiles;
 while l do
 {
  print("\t",head l,"\n");
  l = tail l;
 }
}

Pretty neat?  Let's now try to do something useful.  I've been interested in watching the TCMalloc-based size classes get built.  This is done in runtime.InitSizes at address 0x0040cfe0 per the listing above.

Let's first try to disassemble that function.

acid; asm(runtime.InitSizes)
<stdin>:7: (error) runtime used but not set

Hmm.  No dice.  Maybe if I quote it?

acid; asm("runtime.InitSizes")
<stdin>:9: (error) fnbound(addr): arg type

Still a nope.  Am I doing it wrong?  Let's try another symbol.  How about main, and this isn't the "func main() {}" that you write in your own Go servers.  The latter is called main.main.

acid; symbols("main")
main.main T 0x00400c00 entrypoint main.main
main.init T 0x00400c10 entrypoint main.init
runtime.main T 0x00410150 entrypoint runtime.main
main T 0x00422510 entrypoint main
runtime.main.f D 0x00436c48 entrypoint runtime.main.f
main.initdone. D 0x00456f60 entrypoint main.initdone.

acid; asm(main)
main 0x00422510 MOVL $_rt0_go(SB),AX
main+0x5 0x00422515 JMP* AX
main+0x7 0x00422517 ADDB AL,0(AX)
main+0x9 0x00422519 ADDB AL,0(AX)
main+0xb 0x0042251b ADDB AL,0(AX)
main+0xd 0x0042251d ADDB AL,0(AX)
main+0xf 0x0042251f ADDB CL,b808247c(BX)

Cool!  There's our output, and no quoting required.  What was happening above?  Well, much of the core runtime functions in Go are defined with Unicode · in their names, which the compilation process replaces with a period in the emitted C and Assembly code.  In effect, runtime·InitSizes maps to "runtime.InitSizes" in the output we see above.  If I had to suspect something, Acid doesn't support function names with "." and presumes this to used for struct field member referencing.  We can confirm this by swapping the address "0x0040cfe0" for "runtime.InitSizes":

acid; asm(0x0040cfe0)
runtime.InitSizes 0x0040cfe0 DECL AX
runtime.InitSizes+0x2 0x0040cfe2 MOVL fffffff0,CX
runtime.InitSizes+0x9 0x0040cfe9 DECL AX
runtime.InitSizes+0xa 0x0040cfea CMPL 0(CX),SP
runtime.InitSizes+0xc 0x0040cfec JHI runtime.InitSizes+0x15(SB)
// rest elided

Sure enough, it is a problem of naming.  Under the covers, Acid maps these names symbolically to these addresses.  Let's turn to the task of debugging a running process and setting a breakpoint for runtime.InitSizes to get that pesky size classes table:

acid; new()
<stdin>:2: (error) indir: address 0xffffffffffffffff not mapped
acid; bpset(0x0040cfe0)
Waiting...

Hmm, we're waiting.  For what—Christmas?

According to the process tree, both Acid and entrypoint are running:

root     23650  0.0  0.0  13192  3288 pts/0    S+   20:21   0:00 acid entrypoint
root     23653  0.0  0.0    608     4 pts/0    t    20:21   0:00 entrypoint

Rather specifically, entrypoint is in a state of inspection and just waiting.  I think I've had enough of this for now.  I'll give this another whirl, this time without a breakpoint:

acid; new()
<stdin>:2: (error) indir: address 0xffffffffffffffff not mapped
acid; cont()
open /proc/23679/stat: No such file or directory
<stdin>:4: (error) procnotes pid=23679: No such file or directory

Everything appears to run (I rebuilt the original binary with some logging.), but these two warnings about address ranges and procfs entry missing are disconcerting.  Maybe stepping through it will work instead of breakpoints?  Let's try it out:

acid; new()
<stdin>:2: (error) indir: address 0xffffffffffffffff not mapped
cid; step
0x0041f3a0
acid; func()
cannot locate text symbol
acid; gpr()
AX 0000000000000000 BX 0000000000000000 CX <stdin>:9: (error) reading CX: ptrace: Input/output error

The call to "func()" should have stepped "next" until the current call exited, and "gpr()" ought to have emitted register status.  Cue up the sad-trombone.

Reflections

On the surface, Acid looks pretty amazing: it feels like it fits into a similar vein as the Acme text editor in terms of extensibility and simplicity.

So, why didn't this all work with the breakpoint at the end?  It dawns on me that I was being a big dummy: I should have known better about running Plan 9 from Userspace in Linux.  Acid is probably expecting the process control semantics of Plan 9 and Plan 9 from Userspace cannot get what it needs from Linux.  Turns out that this is correct:

$ 9 man intro
          # previous elided
          The debuggers acid(1) and db(1) and the debugging library
          mach(3) are works in progress.  They are platform-
          independent, so that x86 Linux core dumps can be inspected
          on PowerPC Mac OS X machines, but they are also fairly
          incomplete.  The x86 target is the most mature; initial Pow-
          erPC support exists; and other targets are unimplemented.
          The debuggers can only inspect, not manipulate, target pro-
          cesses.  Support for operating system threads and for 64-bit

     Page 2                       Plan 9             (printed 8/24/14)

     INTRO(1)                                                 INTRO(1)

          architectures needs to be rethought.  On x86 Linux systems,
          acid and db can be relied upon to produce reasonable stack
          traces (often in cases when GNU gdb cannot) and dump data
          structures, but that it is the extent to which they have
          been developed and exercised.
          # rest elided

When I have some more time down the road, I will spin up a virtual machine with Plan 9 on it and re-try this same experiment on it.  This will be a true test of character for the question of Acid versus Gdb and which debugger fares better with Go.  In any case, the planned overhaul of the compiler and linker with a rewrite from C to pure Go may invalidate parts of this document or altogether.

Needless to say, this has whet my interest in Plan 9.

Stay tuned for the next parts of this series, where we examine the runtime with Gdb, tour the bootstrapping process and memory allocator and manager,  garbage collector, etc!




follow us in feedly
 

None of the content contained herein represents the views of my employer nor should it be construed in such a manner. . All content is copyright Matt T. Proud and may not be reproduced in whole or part without expressed permission.