2014-09-06

Go Testing: Easy Polish for World-Class Tests

If you're like me, when you first started out with Go, you may have felt a temptation to use a testing framework. I sure did. Beginner's mistake. A huge one, in fact. I started out using gocheck. To be sure, I don't mention any of this to denigrate gocheck's authors; rather, I just point this out as a matter of preference as my familiarity with the ecosystem grew. This is to say, once I became more familiar with Go and its idioms, the need for these withered. The standard library combined with the idioms just delivered.


So why bring this up? Every day, we face the question about how we want to architect our tests and reason with their output. Go, with its anonymous (unnamed) types and struct literals naturally gravitates itself toward simplicity and conciseness. Table-driven tests are a natural consequence—maybe an end of history moment for the ecosystem. Anyway, I digress.


What I want to focus on in this post is not strictly table-driven tests but rather a few details that make tests in Go easy to reason with, pain free, and maintainable. This will be a chatty post, but I want you to bear with me and experience my anecdotes first hand. For background, I encourage you to read Dave Cheney's venerable piece on table-driven tests if you are unfamiliar with table tests. What I will not do in this post (perhaps in another) is discuss how to design testable code or build correct tests for your systems. My motivations are selfish: the easier it is to test and test somewhat correctly, the more it will be done. Our ecosystem will flourish with high-quality, and our peers can hold one another to high standards. Let's get started before I fall off the ivory tower!


Value Comparisons and Reflection

I come to the table with a lot of Java experience under my belt. With that, comes a strong revulsion toward reflection, because it makes reasoning with a system's design inordinately difficult. Worse, when used incorrectly, it introduces terrible faults by bypassing the compiler's type safety guarantees. You could call this baggage, if you will. When I first saw pkg/reflect, I nearly vomited in my mouth of horror should I ever need to use this thing. Thusly I avoided it—much to my detriment, as I will try to convince you.


Custom Assertions / Testing Framework


In the course of writing tests, I went through several iterations. As I mentioned above when starting out with Go, I wrote custom assertion mechanisms using gocheck. That stopped after needing to perform a couple refactorings and discovered that the maintenance cost to keep the framework-derived tests up-to-date exceeded the cost of the base refactor. What followed?


Iteratively Checking Components of Emitted Output


Iteratively checking components of emitted output—and let me tell you how not pretty that is. Suppose we have a function that consumes []int and emits []int with each slice element's value incremented by one:


// Increment consumes a slice of integers and returns a new slice that contains
// a copy of the original values but each value having been respectively
// incremented by one.
func Increment(in []int) []int {
 if in == nil {
  return nil
 }
 out := make([]int, len(in))
 for i, v := range in {
  out[i] = v + 1
 }
 return out
}
(Source)

How would you test this under the iterative approach? It might look something like this with a table-driven test:


func TestIterative(t *testing.T) {
 for i, test := range []struct {
  in, out []int
 }{
  {},
  {in: []int{1}, out: []int{2}},
  {in: []int{1, 2}, out: []int{2, 3}},
 } {
  out := Increment(test.in)
  if lactual, lexpected := len(out), len(test.out); lactual != lexpected {
   t.Fatalf("%d. got unexpected length %d instead of %d", i, lactual, lexpected)
  }
  for actualIdx, actualVal := range out {
   if expectedVal := test.out[actualIdx]; expectedVal != actualVal {
    t.Fatalf("%d.%d got unexpected value %d instead of %d", i, actualIdx, actualVal, expectedVal)
   }
  }
 }
}
(Source)

What do you notice here aside from how terrible it is? I can't really recall how I thought of this incarnation was ever a good idea, but I suspect it was on a sleepless night. Anyway, if it doesn't appear terrible, let's make a quick inventory of what's deficient:

  • That's a lot of boilerplate to write.
  • The boilerplate is fragile.
  • When the test does fail, it is damn hard to find out exactly which table row failed: All we get are the two index variables at the beginning of the format string. Pray for what happens if the test's table exceeds one page in length! Double pray it isn't late at night, when you'd go cross-eyed staring at the output.

It turns out that there is more wrong with this than what I enumerated. Don't fear. We'll get to that soon. (Our goal is to make the tests so tip-top that Gunnery Sgt. Hartman would smile.)


Hand-Written Equality Test

A later thought was, why not create an equality helper or a custom type for []int? Let's try that out and see how well that goes:

// Hold your horses, and ignore sort.IntSlice for a moment.
type IntSlice []int

func (s IntSlice) Equal(o IntSlice) bool {
 if len(s) != len(o) {
  return false
 }
 for i, v := range s {
  if other := o[i]; other != v {
   return false
  }
 }
 return true
}
(Source)

…, which is then used as follows:

func TestIntSliceIterative(t *testing.T) {
 for i, test := range []struct {
  in, out []int
 }{
  {},
  {in: []int{1}, out: []int{2}},
  {in: []int{1, 2}, out: []int{2, 3}},
 } {
  if out, testOut := IntSlice(Increment(test.in)), IntSlice(test.out); !testOut.Equal(out) {
   t.Fatalf("%d. got unexpected value %s instead of %s", i, out, test.out)
  }
 }
}
(Source)

You're probably asking, "what's wrong with that? Looks reasonable." Sure, this works, but …

  • We've created and exported a new method receiver for a new type. Was this really necessary for users? Would a reasonable user need to use IntSlice.Equal, ever? If you look at the generated documentation, it is an extra item in the inventory, thusly creating further cognitive burden if the method is not really useful outside of the test. We can do better than this.
  • All of the fragility and error-prone remarks from the previous case still apply. We've just shifted the maintenance to dedicated function to perform the work.

"OK, but couldn't you have just made IntSlice.Equal unexported with IntSlice.equal," the peanut gallery protests? Yes, but that still does not represent an optimal solution when compared with what follows.


Using pkg/reflect

So, where am I going with this? pkg/reflect offers this helpful facility known as reflect.DeepEqual. Take a few minutes to read its docstring carefully. I'll wait for you. The takeaway is that overwhelmingly, reflect.DeepEqual does the right thing for you for most correct public API design styles:


  • Primitive types: string, integers, floating point values, and booleans.
  • Complex types: maps, slices, and arrays.
  • Composite types: structs.
  • Pointers: The underlying values of the pointer and struct fields.
  • Recursive values: reflect.DeepEqual memos what it has visited!

Let's take what we've learned and apply it to the test:

import (
  "testing"
  "reflect"
)

func TestReflect(t *testing.T) {
 for i, test := range []struct {
  in, out []int
 }{
  {},
  {in: []int{1}, out: []int{2}},
  {in: []int{1, 2}, out: []int{2, 3}},
 } {
  if out := Increment(test.in); !reflect.DeepEqual(test.out, out) {
   t.Fatalf("%d. got unexpected value %#v instead of %#v", i, out, test.out)
  }
 }
}
(Source)

Boom! You can delete type IntSlice and IntSlice.Equal—provided there is no actual user need for them. Remember: It is usually easier to expand an API later versus taking something away.


One bit of advice: pkg/reflect enables you to apply test-driven development for most APIs immediately when combined with table-driven tests. This is a great opportunity to validate the assumption that reflect.DeepEqual actually works correctly for the expected-versus-actual test. There is little worse than over-reliance on something that yields a false sense of confidence. The onus is on you to know your tools.


Cases When Not to Use reflect.DeepEqual

Surely there's a downside? Yep, there are; nothing good comes without caveats:

  • The type or package already exposes an equality test mechanism. This could be from code that you import and use versus author yourself. A notable example you should be aware of is the goprotobuf library's proto.Equal facility to compare two messages. Usually there is a good reason. Defer to the author's judgement.
  • The comparison of actual versus expected involves a type or composition that is incompatible with reflect.DeepEqual. Channels are an obvious example, unless you are expecting a nil channel on both sides!
  • The type that is being compared has transient state. Transient state may manifest itself in unexported fields. This raises the question of whether the transient state is important. For instance, it could exist for memoization, like of a hash for an immutable type that is lazily generated.
  • You are functionally using a nil slice as an empty slice in your code: reflect.DeepEqual([]int{}, []int(nil)) == false.

Needless to say, if any of the previous apply, exercise extreme caution.


Ordering of Test Local Values: Actual and Expected

It turns out that we aren't done yet. (If you thought we were, you'd end up as happy as Pvt. Gomer Pile during footlocker inspection). Go has a convention with modern tests to place actual before expected. (I highly encourage everybody to visit that link and study and practice its content!) Let's clean up our mess from above:


func TestReflectReordered(t *testing.T) {
 for i, test := range []struct {
  in, out []int
 }{
  {},
  {in: []int{1}, out: []int{2}},
  {in: []int{1, 2}, out: []int{2, 3}},
 } {
  if out := Increment(test.in); !reflect.DeepEqual(out, test.out) {
   t.Fatalf("%d. got unexpected value %s instead of %s", i, out, test.out)
  }
 }
}
(Source)

Why call this to attention? The convention exists; and when it is followed, the faster it is for a non-maintainer to reason with somebody else's code.


Naming Test Local Variables

The Code Review Comments Guide outlines some interesting ideas that ought to be adopted as convention (note that each subsequent bullet point builds on the previous):


  • Input should be named in. For instance, if we were testing a string length function, each test table row's signature could be struct { in string, len int }. Admittedly this is easiest to achieve when the tests' input is unary.


    When it is not unary, sometimes grouping inputs in the table definition to a struct named in suffices. Suppose that we are building a table test for a quadratic function: struct { in struct { a, b, c, x int }, y int }.

  • Expected output should be named want. Our string length example becomes struct { in string, want int }; whereas, the quadratic becomes struct { in struct { a, b, c, x int }, want int }.


    If the tested component's type signature has a multiple value result, you could take an approach similar to the multiple arity input case and group the output into a struct named want. A table test row for an implementation of a io.Reader could look like struct { in []byte, want struct { n int, err error } }.

  • The actual value (i.e., the side effect) being tested should be named got.

What does our example above look like after these rules are applied?

func TestReflectRenamed(t *testing.T) {
 for i, test := range []struct {
  in, want []int
 }{
  {},
  {in: []int{1}, want: []int{2}},
  {in: []int{1, 2}, want: []int{2, 3}},
 } {
  if got := Increment(test.in); !reflect.DeepEqual(got, test.want) {
   t.Fatalf("%d. got unexpected value %s instead of %s", i, got, test.want)
  }
 }
}
(Source)

Formatting Error Messages

How you format your tests' error messages is an important but oft-neglected topic, one that has practical benefit. Why is that?

  • Your failure messages indicate where an anomaly has occurred and why. Think about this for a moment. In the table-tests above, where is conveyed in the initial indices in the print format string. Why is conveyed through the remark of actual versus expected.
  • Your failure messages have an inherent time-to-decode cost for the user. The longer it takes, the more difficult maintenance, refactorings, and reiterations become. It should take no more than two seconds for a non-maintainer reading the failure message to know on what input the test failed! This needn't mean the external parties understand why.

If your test failure messages do not fulfill the points above, they have failed the human requirements! For sake of demonstration, the test failure messages above in this post intentionally fail these criteria!


Format Expressions


Let's take a quick diversion down format string lane… What happens if your test fails above for input of type x and the message is emitted to the console? Would you be able to figure out which table test row is responsible for the failure quickly?


The answer to this depends on the behavior of the type that backs in, want, and got. Does the type formally implement fmt.Stringer? What is the format expression?


If you are lazy and just rely on the default fmt.Stringer behavior and use %s, you may get some results that are hard to read. Consider this example below:


package main

import "fmt"

type Record struct {
 GivenNames []string
 FamilyName string
 Age        int
 Quote      string
}

func main() {
 rec := Record{[]string{"Donald", "Henry"}, "Rumsfeld", 82, `… there are known knowns; there are things that we know that we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know.`}

 fmt.Printf("%%s %s\n", rec)
 fmt.Printf("%%v %v\n", rec)
 fmt.Printf("%%#v %#v\n", rec)
}
(Source)

emits


%s {[Donald Henry] Rumsfeld %!s(int=82) … there are known knowns; there are things that we know that we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know.}
%v {[Donald Henry] Rumsfeld 82 … there are known knowns; there are things that we know that we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know.}
%#v main.Record{GivenNames:[]string{"Donald", "Henry"}, FamilyName:"Rumsfeld", Age:82, Quote:"… there are known knowns; there are things that we know that we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know."}

Compare these emissions for a moment. %s doesn't perform so well. Things can get even worse; suppose Record implements fmt.Stringer and the result is too verbose or convoluted to differentiate table rows?


func (r Record) String() string {
 return fmt.Sprintf("[Record: %s %s]", r.GivenNames[0], r.FamilyName)
}

Note how that fmt.Stringer omits a bunch of fields. Suppose we have multiple table records of Donald Rumsfeld with minute differences. We'd be a one very sad Pvt. Gomer Pile if any test failed.


My advice: stick to using %#v for printing out in, want, and got. You can easily differentiate the output and hopefully find the record in the test table quickly. This also prevents %s and fmt.Stringer from tripping you up if the code comes from a third-party! It is worth the effort.


Content of the Test Error Message

If you are still reading, thank you for bearing through this long post. You'll come out ahead writing better tests. We're now on the final topic: how to make the test error messages useful.


For consistency, prefer using a format like this for pure or semi-pure tests that exercise a function:


t.Errorf("YourFunction(%#v) = %#v; want %#v", in, got, want)

The output is concise and obvious. With clear ways of differentiating between test cases, there is no need to keep that stupid index variable in the format string. Let's now take that test we've been polishing and show what the final output should look like:

func TestReflectPristine(t *testing.T) {
 for _, test := range []struct {
  in, want []int
 }{
  {},
  {in: []int{1}, want: []int{2}},
  {in: []int{1, 2}, want: []int{2, 3}},
 } {
  if got := Increment(test.in); !reflect.DeepEqual(got, test.want) {
   t.Fatalf("Increment(%#v) = %#v; want %#v", test.in, got, test.want)
  }
 }
}
(Source)

With luck, your tests will be easy to decipher and you won't find yourself in a world of shit.


In closing, I put together this blog post largely as form of penance for my mistakes while learning Go and with the side effect that my learner's bad habits caught onto the people I was working with. Patterns are contagious—just like misinformation. Here's to hoping that this makes up for it:

Hail the Right and Just, Cmdr. Pike,
By whose work
Unmaintainable code is defeated, for practicality
Has now overflowed upon all of the world.

In the next posting, I will discuss some more testing patterns and focus less on style. Until then, I bid you good testing!

follow us in feedly

No comments :

Post a Comment

 

None of the content contained herein represents the views of my employer nor should it be construed in such a manner. . All content is copyright Matt T. Proud and may not be reproduced in whole or part without expressed permission.