2015-06-21

Pre-Launch Checklist for Publishing Go Projects

You've built this awesome doo-dad, and now you want to share it with the world. What do you do?

In the vein of a previous article, this one has been about a year in-the-making as well. This morning, I said, "fuck it," and decided to rip out the draft blog content and place it into a Markdown document on GitHub for everyone to see and contribute to. It feels more conducive to community input in this form now, which is a win. Well, here it is:

https://github.com/matttproud/gochecklist

So what is it? I've reviewed and authored a lot of Go code over the last few years, both privately and professionally. This got me thinking: what separates a good Go project from a not-so-good one? It dawned on me: one of the deciding factors has been the diligence taken by the project authors. This essentially comes down to process, so I have attempted to document what an author should do before publishing a project—outside of concerns of technical correctness and architecture.

Fulfilling this checklist won't make a fundamentally bad project good, but it will break ties between identical implementations. It certainly helps seed the creative process with the perspective of your users when creating a new project from scratch, however!

This suite of documentation is nascent and deserves detailed writeups outside of the summary document. Any contributions you have would be appreciated.

follow us in feedly

2015-06-16

Rate Limiting in Go: Dissection of a Package

A few months back, I was working on a project that required some form of rate limiting to moderate sudden spikes in operations triggered by influxes of incoming events on a message bus. The solution to this problem was not to implement rate limiting but rather aggregate the events on the publisher side and emit them as batches. It got me thinking: how would I implement a rate limiter?

Well, now you can see. I uploaded my package called faregate to GitHub, with its API documented here.

I chose to implement it as a token bucket load shaper, so let's dive into by discussing its parameters:

  • The crux of this class of load shaper is single parameter R, which signifies the interval at which a single token is replenished to the bucket.
  • A secondary point of interest is L, which signifies the maximum no. of tokens the bucket may have at any period.

Using these parameters, we can model how the bucket works (ignoring the users for a moment):

There exists a bucket with at most t tokens, which may be no more than L at any given time. At a fixed period R, a single token is added to the bucket if t is less than L.

A new bucket is created by calling faregate.New. This factory applies a musing from Rob Pike about application of options. Self-standingly are the option setters faregate.TokenCount, which sets L, and faregate.TokenCount, which sets R. As a side note, I chose to have faregate.TokenCount set an initial t value—mainly because I wanted to enable the ability for the bucket's user to request more than one token at the creation of the bucket, which admittedly can lead to some small amount of initial burst. A new optionFn could be added easily that changes this behavior.

We have alluded to the bucket's users so far, so let's turn our attention to them. A user requests v tokens from the bucket. If v exceeds L, that request is said to be non-conformant and an error, since at no point may the bucket break the invariant that L ≥ t, which means t == v is unfulfillable if v > L. Everything is self-explanatory if t ≥ v, but what happens if it isn't? This turns to an explicit design decision:

The Google context package does one thing right (among many others): it employs the idioms of the language, namely Context.Done returns a <-chan struct{}, which means it may be directly select {} upon. If you've used this package, this power has spoken to you. So why the diversion? Simple: wouldn't it be great if the bucket's APIs were idiomatic to that same degree?

It turns out that can happen, and that's how I designed it. You see, Faregate.Acquire returns a channel whose closure signals v tokens have been furnished for the request. It looks something like this:

package faregate

import (
 "fmt"
 "math/rand"
 "time"
)

func Must(c <-chan struct{}, err error) <-chan struct{} {
 if err != nil {
  panic(err)
 }
 return c
}

func Example() {
 rnd := rand.New(rand.NewSource(42))

 fg, err := New(RefreshInterval(time.Second), TokenCount(100), ConcurrencyLevel(1))
 if err != nil {
  panic(err)
 }
 defer fg.Close()

 for {
  q := rnd.Intn(10)
  <-Must(fg.Acquire(uint64(q)))
  fmt.Println("acquired", q)
 }
}
(Source)

… which brings us back to that v > t case. The user can simply select {} over the result of Faregate.Acquire to determine when the tokens are available. So, speaking of context.Context, an implementation that is using them for cancellation, deadlines, etc. can use the following pattern:

package faregate

import (
 "fmt"
 "math/rand"
 "time"

 "golang.org/x/net/context"
)

func Must(c <-chan struct{}, err error) <-chan struct{} {
 if err != nil {
  panic(err)
 }
 return c
}

func Example() {
 var (
  rnd     = rand.New(rand.NewSource(42))
  bground = context.Background()
 )

 fg, err := New(RefreshInterval(time.Second), TokenCount(100), ConcurrencyLevel(1))
 if err != nil {
  panic(err)
 }
 defer fg.Close()

 for {
  var (
   q   = rnd.Intn(10)
   ctx = context.WithDeadline(bground, 5*time.Second)
  )
  select {
  case <-Must(fg.Acquire(uint64(q))):
   fmt.Println("acquired", q)
  case <-ctx.Done():
   fmt.Println("context cancelled/timedout/deadlined:", ctx.Err())
  }
 }
}
(Source)

To get an idea on why the idiomatic nature is important, replace the Faregate.Acquire API with one that blocks (e.g., Acquire(int) error as signature) or returns a truth value to indicate ability to acquire (e.g., CanAcquire(int) boolean as signature), and compare how complicated the alternatives to the original examples above become. Fluent and idiomatic API is key.

Another feature in the design that I am particularly happy about is the implementation supports fairness (in the sense of FIFO queueing requesters) if the faregate.ConcurrencyLevel has been set to a positive non-zero value, ideally to the number of no. of routines accessing the bucket at peak.

Are there any caveats with this implementation? One obvious one is if R is too frequent for the runtime's scheduler to effectively meter. I have not built a test suite to measure this, but the results would be interesting. A secondary one could be context switch and scheduling overhead, though it is my understanding that release 1.5 will mitigate much of this.

Are there any missing feature or warts? Here are the open items in my mind:

  • Formalizing the Must idiom into the public API if one is confident that he or she won't break the v > L invariant.
  • Providing an option to initialize explicitly the initial no. of tokens, because it currently defaults to L.
  • Making the API Context-aware.
  • Making the default concurrency level runtime.GOMAXPROCS.

… and if we turn our attention to the Go wiki, this mechanism is not too different from a trivial solution that it prescribes. Take your pick!

Either way, this was not meant to be a super serious exercise but rather an exploration to see what we can learn, especially in API development. Crack the source open, give it a good study, and let me know what improvements you'd make! You can find plenty of other implementations out there, so there's a lot to compare.

follow us in feedly

2015-06-14

testing/quick: Blackbox Testing in Go for Fun and Profit

In today’s post, I would like to introduce you to—in my opinion—one of the most underappreciated packages in the Go standard library: testing/quick. If you’re unfamiliar with testing in Go (incl. idiomatic testing), I suggest you familiarize yourself with the topic before continuing this article.

So why dedicate an article to a package in the standard library when its own documentation should self-standingly explain and sell the necessity of the package? Simple: the untapped potential found within it, which I think you’ll see very quickly. Much of this power is a vein similar to Haskell’s QuickCheck facility (a nice introduction to Haskell QuickCheck for the uninitiated).

Without further ado, let's get you bootstrapped on the topic of blackbox testing in Go …

Pillar I: Value Generation

We begin this tour not with testing but rather generating values for types seemingly out of thin air, since that concept is an important pillar of this package. Our attention thusly falls on the function reflect.Value. Its docstring reads:

Value returns an arbitrary value of the given type. If the type implements the Generator interface, that will be used. Note: To create arbitrary values for structs, all the fields must be exported.

Sounds self-explanatory, right? What about its signature?

func Value(t reflect.Type, rand *rand.Rand) (value reflect.Value, ok bool)

Ouch! If you’re like me, you might run for the hills when you see an API or package using reflection—especially since many an amateur package flagrantly overuse and abuse it. Not so fast; this is in the standard library and is probably well-reasoned. Let’s demonstrate what it does:

package main

import (
 "fmt"
 "math/rand"
 "reflect"
 "testing/quick"
 "unsafe"
)

var rnd = rand.New(rand.NewSource(42))

func report(typ reflect.Type) {
 defer func() {
  if err := recover(); err != nil {
   // IIRC, arrays would panic in 1.4.1 and previous.
   fmt.Println("\tCOULD NOT GENERATE")
   return
  }
 }()
 val, ok := quick.Value(typ, rnd)
 if !ok {
  fmt.Println("\tCOULD NOT GENERATE")
  return
 }
 fmt.Printf("\tGENERATED: %#v\n", val.Interface())
}

func main() {
 var table = struct {
  // Table of reflect.Kind Zero Values
  _ bool                // reflect.Bool
  _ int                 // reflect.Int
  _ int8                // reflect.Int8
  _ int16               // reflect.Int16
  _ int32               // reflect.Int32
  _ int64               // reflect.Int64
  _ uint                // reflect.Uint
  _ uint8               // reflect.Uint8
  _ uint16              // reflect.Uint16
  _ uint32              // reflect.Uint32
  _ uint64              // reflect.Uint64
  _ uintptr             // reflect.Uintptr
  _ float32             // reflect.Float32
  _ float64             // reflect.Float64
  _ complex64           // reflect.Complex64
  _ complex128          // reflect.Complex128
  _ [4]byte             // reflect.Array
  _ chan struct{}       // reflect.Chan
  _ *uint8              // reflect.Ptr
  _ func()              // reflect.Func
  _ interface{}         // reflect.Interface
  _ map[string]struct{} // reflect.Map
  _ []uint8             // reflect.Slice
  _ string              // reflect.String
  _ struct{}            // reflect.Struct
  _ unsafe.Pointer      // reflect.UnsafePointer
 }{}
 typ := reflect.TypeOf(table)
 for i := 0; i < typ.NumField(); i++ {
  fld := typ.Field(i)
  fmt.Println("type:", fld.Type, "kind:", fld.Type.Kind())
  for j := 0; j < 3; j++ {
   report(fld.Type)
  }
 }
}
(Source)

It yields something similar to this (depending on your runtime's release):

type: bool kind: bool
 GENERATED: false
 GENERATED: false
 GENERATED: true
type: int kind: int
 GENERATED: 886688009
 GENERATED: 1386077449
 GENERATED: -1690143155
type: int8 kind: int8
 GENERATED: -97
 GENERATED: -64
 GENERATED: -124
type: int16 kind: int16
 GENERATED: 2136
 GENERATED: -19454
 GENERATED: -28703
type: int32 kind: int32
 GENERATED: -2121370546
 GENERATED: 1289973693
 GENERATED: 1054271550
type: int64 kind: int64
 GENERATED: -311716574456517860
 GENERATED: 1958729407415377171
 GENERATED: 1991382105283397538
type: uint kind: uint
 GENERATED: 0xb1b8ec7
 GENERATED: 0x51a7821b
 GENERATED: 0x3c903ced
type: uint8 kind: uint8
 GENERATED: 0x5a
 GENERATED: 0xed
 GENERATED: 0x7b
type: uint16 kind: uint16
 GENERATED: 0xfdfe
 GENERATED: 0xe1
 GENERATED: 0x2c69
type: uint32 kind: uint32
 GENERATED: 0x71701c1c
 GENERATED: 0x5ac22ddc
 GENERATED: 0x5e7029e1
type: uint64 kind: uint64
 GENERATED: 0xef052b8e822623c9
 GENERATED: 0x1c61108e6517386e
 GENERATED: 0x324144f3fd478949
type: uintptr kind: uintptr
 GENERATED: 0x16c1d40e
 GENERATED: 0x6f6b57e3
 GENERATED: 0xc0ba7489
type: float32 kind: float32
 GENERATED: 3.187561e+38
 GENERATED: -5.1580155e+37
 GENERATED: 2.5290076e+38
type: float64 kind: float64
 GENERATED: 4.588014221460887e+307
 GENERATED: -1.2664307027308244e+308
 GENERATED: 1.5003733931809143e+308
type: complex64 kind: complex64
 GENERATED: (-1.8948403e+38+6.3610383e+37i)
 GENERATED: (3.1927506e+38+1.8130937e+38i)
 GENERATED: (1.542176e+38-1.3307128e+37i)
type: complex128 kind: complex128
 GENERATED: (-1.6426314879941742e+308-3.0125415252062856e+307i)
 GENERATED: (-1.0902015428013727e+307-5.490171496779273e+307i)
 GENERATED: (-1.6140880161791802e+308+4.545285654928386e+307i)
type: [4]uint8 kind: array
 COULD NOT GENERATE
 COULD NOT GENERATE
 COULD NOT GENERATE
type: chan struct {} kind: chan
 COULD NOT GENERATE
 COULD NOT GENERATE
 COULD NOT GENERATE
type: *uint8 kind: ptr
 GENERATED: (*uint8)(0x104367b2)
 GENERATED: (*uint8)(0x104367be)
 GENERATED: (*uint8)(0x104367ca)
type: func() kind: func
 COULD NOT GENERATE
 COULD NOT GENERATE
 COULD NOT GENERATE
type: interface {} kind: interface
 COULD NOT GENERATE
 COULD NOT GENERATE
 COULD NOT GENERATE
type: map[string]struct {} kind: map
 GENERATED: map[string]struct {}{"\U000e4395":struct {}{}, "\U000325fb𤶢\U00011995":struct {}{}, "\U00101eee\U0008a450\U000ac870\U00060a01\U00044647\U000a6bc7\U0008521c\U001064dc\U00036991\U00075bd2\U000daaac\U00090ec7\U000e0dd2":struct {}{}, "\U0001eafa\U0001da2d\U000778fe\U00083128\U000f6754\U0006bf86\U0003125b\U000d9c02\U00102322\U000bddaa\U0004218d\U0009611b\U0004410d\U000ccac0\U00049112\U000e9e35�\U000a9240\U000fc907𢧔\U000c3c1d\U000ea93d\U0006e681\U0006160f\U00040773𒂃\U00060243\U00103f0e\U00087a15\U000e4e78\U0005cb57\U00040799\U0005f8fe\U000e4da0\U0004c8b1\U00075efb\U00083119\U000b14f1\U000768ab\U00045c20":struct {}{}}
 GENERATED: map[string]struct {}{"\U00079c1d\U0004e807\U0008c903\U0006591c\U0002fc86\U000ace26\U0010ed19\U000bad37\U00064dca\U0005959b\U00100c01\U000361d1":struct {}{}, "\U0007e11f\U00010c8d\U00107cb8\U001059a4\U0003665e\U00030087\U00059b5f\U000a2101\U0005b980\U0010da1d\U0007588b\U000569d2\U000c3ab1\U000d67c9\U0004e896\U0005ccf4\U00031096\U000fef82\U0006e7f5\U000f22fe�\U0010fa0a┴\U0008a405\U0007e6db\U0006d5ef\U000ba4e2\U000e9f0d\U000b6390\U000896bc\U00037a92\U00058298\U0009510e𨩳\U000c5191\U0010ad2e\U00097902\U000a38a3\U0008aaf5\U000e72fa\U0003675c\U000c1a2e\U000c1dfdð©¥½\U0004bca2\U00069f4a":struct {}{}, "å¼¢\U0008b084\U000192c4\U0006537f\ueff1\U000881c0\U000da4e3\U00079ce9\U0001b2ca\U0007a47f\U0008ce41\U0001e387\U00030e04\U000b9306":struct {}{}, "\U000ca1b5\U0005296b\U000d3f76\U0006cd13\U0008d14d\U00068e37\U0009b4d3\U000318b6\U000ac738\U000e43a1\U00083f13\U00087654\U000fdc7d\U000fde90\U000716d0\U0004c783햮\U0006f6ac\U00048b82\U0007d921":struct {}{}, "\U000c80d5\U000dc2b1\U00106867\U0001b2e2\U000f2622𣣏":struct {}{}, "\U000a34d2\U000f6a83\U000689c7\U0004eed7\U00106b78딘\U0006ea91\U0009461d\U000eaf48\U0002e42d\U000ef23a\U00099e8e\U0002e844\U000e613e\U000baab8\U000a0997\U0008e1b9\U00103e8a\U000c6afd닇\U0004ff7b\U000cd233\U000cb294\U000625b1\U00066507\U000f51c1\U00065769\U00064604\U000a703d\U000a0c3d㧙":struct {}{}, "\U00039cd9\U00057d29\U00052c18\U0010cf06\U000f2f3a\U000d6d33\u4dbd\U000cdf45\U000d9661䘋\U000341a3\U0006ceaa\U00036519\U0006206d\U0002d464\U000bd7a5\U000a0970\U000a6f46\U00034887\U0004e876\U000e3700\U001051f9\U00077edd":struct {}{}, "\U0004a3a5\U000b22e7𨍙\U0007e7bb\U000c71c7\U00105b8b\U0006e1af\U00062bc8\U0002d7e3\U00034209𦚇\U0003cdd2\U0010e4da\U000f765f\U0008f665\U00105edc\U00095b29\U00081bcb\U000df645\U0003dc73\U0001ab8c煨\U000f5ad9\U00017721\U00060d80𣡅\U000fe2c7\U000d04cd\U0002c531\U0005d464\U00019890\U0006af12\U000db301\U00066eef\U000a4f5b\U000e5a5d\U00071476𧅒\U00079fd3\U000f0c30":struct {}{}, "\U000831bc\U000f8ee2𢼺\U00066a7e\U000cb882\U00078569\U000a969f\U0007c438\U000d0318\U000c3aee\U000c0192𦃅𣍈\U00077f00\U000ce223\U00012e27\U0007efec\U00068a90\U000fd0a7𪟞\U00042424\U0005f747𪠃\U000b2c73💄𝇃\U0009486b\U000cc652\U000df5cf𤺸\U000ef4d2\U000d1831":struct {}{}, "\U000395c7\U000973f4\U000c1869\U00013577\U0008eb55\U000f4fe9\U000494e0\U000a1d51\U0006359d\U000f2ee9\U000b279a\U000ce976\U00059418\U000c1839\U000d51f3\U00055902\U000c18ba\U000ea29d\U0003d308\U000ee57a\U0001bd49\U00079d99\U000482c6\U000abe47\U000ae76a\U00091ac9\U000e5c26\U000ea00e\U00036a30\U000de638\U001028c6\U00032f0a\U00089778\U000f4cac\U000c7c67\U0009ab80\U0008b0c1\U000656fd\U000ad0f5\U00053186\U000b7c91":struct {}{}, "\U000d3156\U00062ad7\U000d208d\U0009154e\U000d8fe9\U0005e1ce\U000d4acc\U0006c358\U000c1054\U000415fc\U0009d6d7\U000dda14\U000b8365\U000df90d\U0007d5bf":struct {}{}, "\U0010b066\U000f1aaa\U00099b71\U000bb256\U000cc683𝒛\U000b9d2d\U000a5f37\U000efcd4\U0003f009\U000ccea4𪨹\U00076987\U000a8592\U000966e4\U0002c173\U00019c35\U000f72c9\U0003c08b𧛃\U0004a31b\U00057a3d\U0007cc77\U000565d7\U0009f994\U000707d7\U000523ef\U000e924f\U000e72c2\U00069415\U000c48b3\U000de557":struct {}{}}
 GENERATED: map[string]struct {}{"\U000f62dd\U00047afa\U000e45b8\U000cf7d3\U000f2c02\U0004aaf4\U0005c68a\U000ef094\U000ef91d\U000aafa9\U0008f7cb\U00070c69\U0002f719":struct {}{}, "\U00030f37\U0010d219\U00062a34\U000dd562\U0005a8aa\U00092a4b𧕼\U00051427\U0010609b\U00081614\U0010bbec\U000e4ccb\U00036dd9\U0003ffe9\U000ecbbd\U000fcd67\U0010bd5e\U000a1677\U0009347a\U000b2dc3\U000934f4\U00076d38\U000e281d\U000edda8\U0008d8e5\U0003a930\U000bbe31\U00060690\U000f1d83\U000ab363\U00089232䅔\U000b4d0f":struct {}{}, "\U00038a76\U000bd02b\U00031db7\U00013ace\U000f517d\U0003477b\U000846ae\U0002db2c\U000a8f90\U00042f6d\U000ff986\U0010ffd1\U000b55aa\U000b9b06\U000da098":struct {}{}, "\U000a3813𪡼\U000d17c3\U0001fa04\U00014010𣲈\u1ae1\U000487b1\U0010d94a\U0005d60e\U0005caa3\U000b01e2\U000ac766慨\U000fe809\U0010fee7\U000695da쩐\U000a6dfb\U000bd0d8\U0009edb6\U000a12fd\U0004ec4f\U0005c5b6\U000b6711\U000d65a2\U00074078\U00095a3d𥤉\U001045ea\U000de76a\U000c8c69\U0010c378\U000d163b\U000b96fb\U0010f818\U0005988a":struct {}{}, "\U000d6143\U000832c4\U000979c0\U0010316f\U0010337c\U0006a3aa\U00076db6\ued44\U00080600\U000cfebc\uee39\U00033ff7\U00041a1c\U000bbc50\U0003db07\U000e156d\U0004d0e8\U0010be57\U0007a66d\U00098fc6\U0006fa46\U000f2398\U00039750趝𤂛\U000c31c7\U000f612a\U0010482f\U0006f400\U000eb6f5\U000e85c0\U0003210f\U00031062\U000798e5\U000ea712\U00038c9c\U000da8fa":struct {}{}, "\U00030314\U00050152\U000ecedfåµ²\U0006cc1b\U000a6300\U000369c7\U000fad75\U000a5842\U0006a74c𥾗\U000df0e8\U0002f140\U000fa671\U0005e44d\U000b78f1\U0010786e\U0006820d\U00053cc5\U000514aa\U0010f388\U0010cb49\U0001dcb7𢞬䴂\U0002bb7b\U00076556\U000dba72":struct {}{}, "\U00086da5\U00072a62\U000691cc\U0002d49b\U0005cb5f\U000ee3d9\U0010222a\U000eff04\U0006ec98\U00076190華\U00083ce8\U000e566b\U0003a00b테\U00014c48\U000f44ed\U000e9c8f\U0010023a\U000ddd17\U00061fae\U000af4c1\U0003a0d7\U0005ba62\U0003956a\U000f3f64\U0009cce2\U0001e132\U00107b4d\U0005f708\U000c3fc9\U0004b6dd\U00049c08\U0005e747":struct {}{}, "\U000aa87b\U0001eaec\U000fabf2𠮢\U00097762\U000c3b56\U000e2f18\U001033ac\U00039690\U00078774\U00075120\U0006f8a4\U000999b0\U0009cc8e\U0004de12\U00019984\U00087a40\U000ec2ea\U000b4ec2𢦐\U00108976\U000155a4\U00062364\U000ffe5c\U0009a7ca玚\U00103f33┳\U000bb7dd\U00036018\U0004cc1a\U000cec4c\U0008c2b5捇\U000a50ca\U000efba0\U0007efc0\U00051343\U0008813e\U00018223":struct {}{}, "\U00098753\U000581d4\U000d3c22\U0006105e\U000a8937\U000cb33b\U000498e8\U000c36aa\U0009cc35\U000fa1a8🌄\U000654ba沗\U00048f74\U00055e4c\U000877b2\U000a9927𦶠\U00105566\U00103fa0\ue613\U000bbea3\U000930a5\U000f0eb8\U0010e4dc\U00019a78\U000e6ab5\U000dffff\U000f4372\U0004602a\U000bfa09\U00085d49\U000f7beb\U0004db8f\U00034ece𤺋\U0003df57\U00038824\U000ba4db":struct {}{}, "𑈟\U0002f435\U00041ebb\U000910c8\U00106298\U0001bfb2\U00095d05\U000fdefb\U00109bf0\U000fc0c5\U00052be1\U000fe3fa\U000681b2\U0006b073컾\U0003ebdc\U0006ba56\U0005872c\U00048681\U00100934\U0003f604\U000e958e𤓴\U00086986\U000dac54\U00037d0c𨂾乫\U0003b682\U0005c3bf\U0004b2e2\U00058d93\U0003e8f1\U000fc822\U00072e7b\U000a286d":struct {}{}, "\U0002de33\U000d9f49\U0008419f\U000116e9\U0002e398\U000c2d5b\U0009f396\U000922f1\U00075bd3\u245b\U000453c1\U00043efd\U000b3418":struct {}{}, "\U00045e28\U000f075a\U000fee42\U0004ed2a\U0002f2f1\U000383ae\U00043882\U0004ee9f\U000caee1\U00039287\U0003a75e\U000ea1d6\U00039dda\U000f302c\U001005d8\U000ec430\U00082b07\U0005f05d\U0003e65f\U00037173\U00017db8\U0009d9fe\U000ab906𫗿":struct {}{}, "\U0007362f\U000f535f\U000944a9\U0008d6d4\U0008a81f\U000cd90c\U00090d06\U00099df0\U000bc157\U000c9350\U000486fd\U000b9928\U000385cc\U000b8c88\ue7ed\U000b4164\U000cfd5d\U0004f7b1\U0006faaf\U0005ebf6\U00048901𧨽\U000742b8\U00067cc6𠦒\U000f33ba\U00089264\U000c4d9a\U000f52fd\U000f0aa6\U0004f190\U000adee6\U00109657\U00032efa\U000c1f53\U000403e0\U000e4cb0\U0003d087\U0006c519":struct {}{}, "\U0007eb38\U00064fe8\U001025fb\U000499d1\U000abd6e\U00069fef\U00012570\U000c749b\U00080427\U00101715\U000af510�\U00079155\U000a7cb3\U00096be8廨\U000ab026\U000ca2ce\U000ead97\U0010b0a5\U000fc4be\U000c6f28\U000d2463ð¡´ª\U0009ca42\U000a0dcc\U00084a9a\U000835e6𩘶\U0010344e\U00082ece먔\U000ee140\U000a58f0\U0007c304\U00078688\U00049470\U000e16a5\uea5e\U00107ec8\U000c672e":struct {}{}, "\U0009db7a\U000bd3f4\U0007d69a\U000f41ae\U000b5dbf軛\U000e5e28\U000b8776詙쁍\U00086401\U000a5621\U00010069\U000fc753\U000946b1\U000405e6\U0003634d\U0001488c\U00062de1\U000c0a6a\U00046451\U0007b5fd\U000d3b6f\U00033b02\U000fd76d\U000c6e79\U000c284d\U0001f8ca\U000a8ecd\U000908f6\U000aa47b\U000cae8e\U0006ed32\U0009f82a\U0004374f\U000faba7\U000d7038𡽖\U000beaec\U00018d18\U00061c25\U000ff535":struct {}{}, "\U0008d1ba\U000794f5\U000ec0db\U00073583\U0003d80f這\U0001359a\U0006fb83\U000b3319𦲯\U000d5e2b㣣":struct {}{}, "\U00098817\U0002d717\U00090902\U00061890\U0005d2d4\U0003eb77\U0004b44d\U000b2ae6\U00097d72\U000b0fb1𧩸\U00072c6f\U000c6287\U000b91f1\U0002eefa𣡲𫃬\U000bd50a\U000db5b7\U000fca99\U00064829\U0007750f\U000a8250\U0009a3f7𢧿\U000c67bf\U00109122\U00103f76\U000d82d9湄\U000b7474\U00046a54\U000a6398\U000f530b\U00058931矕\U000470b8\U000c2a39\U0009e1ba\U000b2898\U000bb96f\U00017d54\U00040bbe\U000c738b\U000a8daf":struct {}{}, "\U0010faf4\U0008ce42𤠫\U0006d828𣋭𒀔\U000638ff\U000d692e\U000b5daf\U00104f36\U00052752튻\U0002e93c\U000af37c\U000c823a\U0010f2dd":struct {}{}, "\U000d3b87\U00089dec\U000484f3\U0010c6de\U000332f0\U0007d6d4\U0003cefe\U0004872d\U000a9b92\U00067244\U000ebb6f\U000cbffd\U000d9646\U000e2ee4\U000aa109\U000a7ff7\U000a4512\U00087ad6\U0005cccd\U0002d359噀\U00061711\U000b7aad\U000e7067\U000c0bcd\U0003b2d0𡋀\U000e816b\U000e789a\U0005d37e\U00057e8b\U000b4864\U000bbca3":struct {}{}, "\U000f6a55\U0007e36a\U000a3e4a\U00083484\U0008eae7\U000c0489\U00057088\U000399a6\U0008994f\U0010a4f1\U000f17d9\U000bd0a4\U0009578f\U0009cac5\U0010616c\U000b8f6b\U001096e6\U000f27d0\U000ef17b\U0007791f\U0003e240":struct {}{}, "\U000e71ea\U000c7b28\U0001b700\U0001c2c9㮋\U0010441e\U00065199\U000666e0\U000bac6b\ue1f3┐\U000c4a96\U00102015\U00031e9e\U00101f33\U000f9228\U000438a4\U0007fde1\U000c25be\U000ff37a\U0003cc3c\U000cda88\U00036541\U000b076e\U000306c6𧳎":struct {}{}, "\U000aec6b\U0010f3f8𩞥\U00043ae7\U000bc8cb\U0007a146\U000f7290\U000610c7\U0006c9c8\U000ee66c\U00099ec5\U000752d8\U0007e35f\U000fc25b\U000dfe5c\U0006e3d2\U000ad853恕\U000c920a\U00013d1e\U000b84a8\U000a7171":struct {}{}}
type: []uint8 kind: slice
 GENERATED: []byte{0xd0, 0xa2, 0x25, 0x5a, 0x46, 0x6c, 0x5b}
 GENERATED: []byte{0x1c, 0xdb, 0x3b, 0x18, 0x35, 0xc9, 0x4f, 0xb0, 0xce}
 GENERATED: []byte{0x32, 0x50, 0xa0, 0x49, 0x60, 0xfa, 0x31, 0x2f, 0xdb, 0x96, 0xfe, 0x37, 0xb4, 0x70, 0x47, 0x84, 0x31, 0x61, 0x47, 0x89, 0x4e, 0xdd, 0x55, 0xb7, 0x4b, 0x82, 0x2d, 0x1f, 0x76, 0x4, 0xe8, 0x2b, 0xdd, 0xb8, 0x88, 0xee, 0x9a, 0x3c, 0xb5, 0xe7, 0x98}
type: string kind: string
 GENERATED: "\U000c921f"
 GENERATED: "\U000a70a7\U000461db\U0004f6f7\U0005dce5\U000bf7db\U00031f26"
 GENERATED: "𣁂\U000b6ac3\U001080d9\U0009b4d7𤓝\U00053c27\U0006b5ee\U0007a549틄\U000762e4\U00065f0d𦙧"
type: struct {} kind: struct
 GENERATED: struct {}{}
 GENERATED: struct {}{}
 GENERATED: struct {}{}
type: unsafe.Pointer kind: unsafe.Pointer
 COULD NOT GENERATE
 COULD NOT GENERATE
 COULD NOT GENERATE

Ponder on this for a moment: this function can handily generate example data for almost any given type prototype provided to it. Nifty, eh?

Caveats and Limitations of Value Generation

So before we get too far along, we should outline the various caveats with this facility.

No Unexported Struct Fields in Parent or Children

This is strictly an artificial limitation of the quick.Value implementation. If your type has any fields—or these fields in its children's fields, all the way down to the level of leaf nodes—have unexported identifiers, quick.Value may not be used!

This is to say that

struct {
    x, y int
}

is illegal; whereas

struct {
    X, Y int
}

is legal.

Arrays

This is strictly a limitation of 1.4.1 release and earlier; tip release supports arrays. I was going to add support for this myself but found that someone already did.

This is to say that in 1.4.1 and earlier, [M]T cannot be generated; whereas only unhinted slices can be []T. We’ll return to this topic in a later section!

Channels

Channels cannot be generated by quick.Value. I suspect that this is an artificial limitation as opposed to a substantive one, since reflect.MakeChan exists!

Interfaces

As best as I can tell, this is a legitimate limitation:

  • Barring having a facility like the Go Oracle built into the runtime, we would have to sweep across a registry of available concrete Types to find one that implements the interface (the runtime does not offer this publicly, by the way). Irrespective of whether such a runtime type information registry exists, it would not be obvious (without static analysis) which concrete type to pick from if two or more types fulfill the interface’s expectations.
  • There is no facility for a new Type to be declared at runtime. All types must be declared and known at compile-time. Non-top-level types can be created, but their method sets are fixed and cannot be extended since receivers (MethodDecl) may only be declared on the top-level.
  • Further, the zero value for an interface is nil, which precludes both reflect.New and reflect.Zero!

Recursive Types

Yet another implementation detail, but this one may be fixable but perhaps better not to be. This is to say that if you have a type like a linked list node …

type Node struct {
  V int
  Next *Node
}

… you are out of luck since the generator does not take a size hint for depth. What’s more? The default generator cannot take into account requirements related to the graph’s connectivity if a certain topology is desired or required by invariant: Cycles, Completeness, etc.

Unsafe Pointers

I hope this candidate is pretty obvious to you; if not, let me offer conjecture:

Untrusted Code

This “watch out” category has a large penumbra, so read closely. The core gists of it are …

  • know concretely what types you are working with—from the root to the leaves, even; and
  • have confidence the types will not unexpectedly change out from under you!

Namely the concern with untrusted code is that one of its types that you plan on using in quick.Value violates one of the rules listed above. I’ll list a few categories to be on the lookout for.

Cgo or SWIG-Wrapped Types

quick.Value may be used with Cgo- and SWIG-wrapped types, but I would not recommend it. You would need to validate that all of the aforementioned rules are respected by the generated/wrapped code, which could be a tall order.

For instance, the moment a void pointer appears in one of your original structs, all bets are off.

struct Node {
  void *Value;
  struct Node *Next;
};

You would need to hand-wrap the problematic types and then proxy by-hand the values to the C types. When in doubt, consult this handy C type to Go type appendix.

Code Generated Types

You may use quick.Value on types generated from third-party mechanism (e.g., from IDL or go generate), but you should be careful that you understand to-the-dot how the types are generated and what they contain.

I have a special section below dedicated to Google Protocol Buffers.

Untrusted Types from Foreign Code

This is the most likely class of untrusted code you could run into: external imports. Using some sort of vendoring strategy is advisable. Pick a simple but comprehensible approach or the hotness du jour.

Generating Records for Tests

Let’s take it a step further: could we generate arbitrary data for tests and other purposes? Yes!

General Case Record Generation

Suppose we are interested in orchestrating a scenario that involves three-dimensional coordinates. Using quick.Value, it may look something like this:

package main

import (
 "fmt"
 "math"
 "math/rand"
 "reflect"
 "testing/quick"
)

type Coord struct { X, Y, Z int }

func generateCoord() Coord {
 var c Coord
 val, _ := quick.Value(reflect.TypeOf(c), rnd)  // Special Sauce
 return val.Interface().(Coord)
}

func distance(from, to Coord) float64 {
 return math.Sqrt(math.Pow(float64(to.X-from.X), 2) +
  math.Pow(float64(to.Y-from.Y), 2) +
  math.Pow(float64(to.Z-from.Z), 2))
}

func main() {
 for i := 0; i < 5; i++ {
  from, to := generateCoord(), generateCoord()
  fmt.Println(distance(from, to)) // Use for test.
 }
}

var rnd = rand.New(rand.NewSource(42)) // Don't actually do this part.
(Source)

It outputs:

2.9129966910814753e+09
1.9525569808225183e+09
1.804755364719007e+09
1.9810662459937294e+09
2.949487780488521e+09

It’s that simple!

Using One-Off Non-Top-Level Types for Corner Cases

That previous example is nice and all, but perhaps it is perhaps too contrived for real-world use. Suppose, instead, for whatever reason, the coordinates axes have a de facto value domain of [0, 256], and that due to public API and convenience you prefer to have the type used in Coord for the axis to be an int. What’s the best solution for this? There are several.

Here is one where we leverage the type system to do the right thing:

package main

import (
 "fmt"
 "math"
 "math/rand"
 "reflect"
 "testing/quick"
)

type Coord struct { X, Y, Z int }

func generateCoord() Coord {
 // Rely on type system to generate the values within the constraints.
 type coord struct{ X, Y, Z int8 } // Must be formal type due to subsequent type switch!
 val, _ := quick.Value(reflect.TypeOf(coord{}), rnd)  // Special Sauce
 c := val.Interface().(coord)
 return Coord{int(c.X), int(c.Y), int(c.Z)}
}

func distance(from, to Coord) float64 {
 return math.Sqrt(math.Pow(float64(to.X-from.X), 2) +
  math.Pow(float64(to.Y-from.Y), 2) +
  math.Pow(float64(to.Z-from.Z), 2))
}

func main() {
 for i := 0; i < 5; i++ {
  from, to := generateCoord(), generateCoord()
  fmt.Println(distance(from, to)) // Use for test.
 }
}

var rnd = rand.New(rand.NewSource(42)) // Don't actually do this part.
(Source)

It outputs:

126.49505919204908
217.32464195300082
185.01891795165164
209.49701668520245
139.35924798878617

Note how we defined a non-top-level type called coord within generateCoord to capture the requirement that the axes are within the constraints of int8. We had to manually copy the values from c to the output struct with the respective integer types undergoing type conversion.

Go’s type system is powerful and can help you out, so don’t discount its applicability in these situations.

Generating Complicated or Noncompliant Types

“So, great, you relied on the type system to do the hard work for you,” you say. What happens if the axes must be along the interval of [0, 256] and divisible by two or have some other more complicated constraint? Well, it turns out that quick.Value employs the quick.Generator interface if the type implements it:

package main

import (
 "fmt"
 "math"
 "math/rand"
 "reflect"
 "testing/quick"
)

type Coord struct{ X, Y, Z int }

func (Coord) Generate(r *rand.Rand, unusedSzHint int) reflect.Value {
 val := func() int { return 2 * r.Intn(129) }
 return reflect.ValueOf(
  Coord{val(), val(), val()})
}

// generateCoord is purely for convenience.
func generateCoord() Coord {
 val, _ := quick.Value(reflect.TypeOf(Coord{}), rnd) // Special Sauce
 return val.Interface().(Coord)
}

func distance(from, to Coord) float64 {
 return math.Sqrt(math.Pow(float64(to.X-from.X), 2) +
  math.Pow(float64(to.Y-from.Y), 2) +
  math.Pow(float64(to.Z-from.Z), 2))
}

func main() {
 for i := 0; i < 5; i++ {
  from, to := generateCoord(), generateCoord()
  fmt.Println(distance(from, to)) // Use for test.
 }
}

var rnd = rand.New(rand.NewSource(42)) // Don't actually do this part.
(Source)

It outputs:

125.61846997953764
150.2664300500947
194.7408534437497
180.89776118017602
114.69960767151734

So where else may quick.Generator come in handy? Recall that huge list of limitations for quick.Value from above under Caveats and Limitations of Value Generation? Almost all of the concerns I mention can be mediated or mitigated through the semi-manual creation of a type’s value. Recall how recursive types (historically) do not play well with quick.Value? Let’s implement a generator of singly-linked list that produces list that must contain at least one node:

package main

import (
 "fmt"
 "math/rand"
 "reflect"
 "testing/quick"
)

type Node struct {
 Value int
 Next  *Node
}

func (*Node) Generate(r *rand.Rand, size int) reflect.Value {
 var (
  root, cur *Node
  i         int
 )
 for {
  switch {
  case root == nil:
   root = &Node{Value: r.Int()}
  case i == size:
   return reflect.ValueOf(root)
  case cur == nil:
   root.Next = &Node{Value: r.Int()}
   cur = root.Next
  default:
   cur.Next = &Node{Value: r.Int()}
   cur = cur.Next
  }
  i++
 }
}

// generateList is purely for convenience: at least on
func generateList() *Node {
 val, _ := quick.Value(reflect.TypeOf(&Node{}), rnd) // Special Sauce
 return val.Interface().(*Node)
}

func main() {
 for n := generateList(); n != nil; n = n.Next {
  fmt.Println(n.Value)
 }
}

var rnd = rand.New(rand.NewSource(42)) // Don't actually do this part.
(Source)

It outputs:

377457747
532387611
341549032
886688009
1386077449
457340493
990748575
1276434112
1623923332
1631520856
1948103682
517640161
26113102
1289973693
1054271550
1424681756
739958035
1434224546
186355399
1369932315
1016085741
107116634
1189285101
1239466875
213843454
1306132705
129379433
1903172636
1522675164
1584409057
36053961
1696020590
2101840201
381801486
1869305827
1085961353
831674064
2102236728
1718829907
518803415
1847834394
735023202
87173066
651061184
1177267156
243446367
400486752
2033621664
1843611159
1334554281

Limitations of quick.Generator

Not so fast, though; quick.Generator may not be the panacea you are looking for.

Cumbersome for Modeling Divergent Behaviors for Same Type

Note that quick.Generator is an interface. In order for a type to implement an interface, its method set must contain all of those defined in that interface. Here’s the rub: only types that can define methods on themselves as receivers are eligible to fulfill non-empty method sets, and the Go Programming Language Specification restricts method declaration (MethodDecl) to the top-level (TopLevelDecl) of a file’s body (SourceFile).

Why harp on this point? Suppose you have type Person

type Person struct {
  GivenName, Surname string
  Age uint8
}

that you wish to have generated in divergent ways for separate tests. What do you do then?

You could generate a value sequence of the type in the test body each time. You’re putting a fair bit of noise in your test cases, then. Alternatively you could define type aliases, as we have done to model the corrupt relationship between Mayor Bob Jones and the town’s liquor control board:

package main

import (
 "fmt"
 "math/rand"
 "reflect"
 "testing/quick"
)

type Person struct {
 GivenName, Surname string
 Age                uint8
}

type jonesPerson Person

func (jonesPerson) Generate(r *rand.Rand, _ int) reflect.Value {
 type p struct {
  GivenName string
  Age       uint8
 }
 v, _ := quick.Value(reflect.TypeOf(p{}), r)
 pers := v.Interface().(p)
 return reflect.ValueOf(jonesPerson{pers.GivenName, "Jones", pers.Age})

}

func generateJonesPerson() Person {
 val, _ := quick.Value(reflect.TypeOf(jonesPerson{}), rnd) // Special Sauce
 return Person(val.Interface().(jonesPerson))
}

func generatePerson() Person {
 val, _ := quick.Value(reflect.TypeOf(Person{}), rnd) // Special Sauce
 return val.Interface().(Person)
}

type LiquorStoreClerk struct{}

func (LiquorStoreClerk) MayPurchase(p Person) bool {
 // The mayor is Bob Jones, and he gives/gets kickbacks, so skirting the law is
 // just a natural consequence for his family.
 if p.Surname == "Jones" {
  return true
 }
 return p.Age >= 21
}

func main() {
 var clerk LiquorStoreClerk
 fmt.Println("Major Bob Jone's Family:")
 for i := 0; i < 5; i++ {
  p := generateJonesPerson()
  fmt.Println(p, clerk.MayPurchase(p)) // Test to Ensure Always True
 }
 fmt.Println("Poor Saps (Citizenry):")
 for i := 0; i < 5; i++ {
  p := generatePerson()
  fmt.Println(p, clerk.MayPurchase(generatePerson()))
 }
}

var rnd = rand.New(rand.NewSource(42)) // Don't actually do this part.
(Source)

It outputs:

Major Bob Jone's Family:
{𻓊񯜐򉏐񽠳򧵃 Jones 159} true
{񙹒󷪰􁾂𘐓𪆠򻯅󵒋򩋯𝍡󽓜𛗆𹅟䒐򹦥񯗬󕤐󢳹󸡽񰟔򉟡󧞿󴂽𒹒𡗽򔮫񤚄 Jones 227} true
{򢁃󊔸𯹀跔򗫣𤢜󿍋򜁕󴗍񆇖󈧊󒒃񚩾󋪗񗚂񬔄򑧅񄪸祣󡺎򚃻񣮎񦹕𞙒񉟩𫷊㓶󉸔󳌫󥃅򶋴򴎉򴻞󇳥񡃅򬠕򕷠𦞫񾅭򑞂񄹕󤎕񒸽𲗻𤶢𑦕𦾋􁻮 Jones 45} true
{񠨁񄙇򦯇򅈜 Jones 105} true
{񵯒󚪬򐻇󠷒󖮸𞫺𝨭񷣾򃄨󶝔񫾆𱉛󙰂􂌢򽶪񂆍򖄛񄄍󌫀񉄒󩸵�򩉀󼤇𢧔󃰝󪤽񮚁񡘏񀝳𒂃񠉃􃼎򇨕󤹸񜭗񀞙񟣾󤶠񌢱񵻻򃄙򱓱񶢫񅰠򽉪󲢜𹳙 Jones 21} true
Poor Saps (Citizenry):
{ôŒ¼†ó²¼ºó–´³䶽󍽅 䘋𴆣ñ¬ºªð¶”™ñ¢­ð­‘¤ò½ž¥ò ¥°ò¦½†ð´¢‡ñŽ¡¶ó£œ€ô…‡¹ñ·»ç‹¥ñ¹°ñŽ ‡òŒ¤ƒñ¥¤œð¯²†ò¬¸¦ôŽ´™òº´· 252} true
{󛌁ñ¦»¯ò¤½›ó¥©ñ±‘¶ð§…’ñ¹¿“ó°°°ñ¿…›ñ¾„Ÿð²ô‡²¸ô…¦¤ð¶™žð°‚‡ñ™­Ÿò¢„ñ›¦€ô¨ñµ¢‹ñ–§’󃪱󖟉ñŽ¢–ñœ³´ð±‚–󾾂ñ®Ÿµó²‹¾� ┴òŠ…ñ¾››ñ­—¯òº“¢ó©¼ò¶Žò‰š¼ð·ª’ñ˜Š˜ò•„Žð¨©³ó…†‘ôŠ´®ò—¤‚ò£¢£òŠ«µó§‹ºð¶œó¨®ó·½ð©¥½ñ‹²¢ñ©½Šð¶ Œòƒ†¼ó¸»¢ð¢¼ºñ¦©¾ó‹¢‚ñ¸•©ò©šŸñ¼¸óŒ˜óƒ«®ó€†’𦃅𣍈ñ·¼€óŽˆ£ 236} true
{򑫉󥰦󪀎𶨰󞘸􂣆𲼊򉝸󴲬󇱧򚮀򋃁񥛽򭃵񓆆 􌂱󓅖񢫗󒂍򑕎 32} true
{򍅍񨸷򛓓𱢶򬜸󤎡򃼓򇙔󽱽󽺐񱛐񌞃햮񯚬񈮂񽤡񦄻􋁦󱪪򙭱򻉖󌚃𝒛򹴭򥼷󯳔𿀉󌺤𪨹񶦇 򖛤𬅳𙰵󷋉𼂋𧛃񊌛񗨽񼱷񖗗򟦔񰟗񒏯󩉏󧋂񩐕󄢳󞕗񺸏򣓒󶪃񨧇񎻗􆭸딘񮪑 172} true
{򵖪򹬆󚂘ѥ񅸨󰝚󾹂񎴪𯋱𸎮񃢂񎺟󊻡𹊇𺝞󪇖𹷚󳀬􀗘󬐰򂬇񟁝𾙟𷅳𗶸򝧾򫤆𫗿򼔌򆶥񲩢񩇌𭒛񜭟󮏙􂈪󯼄񮲘 華򃳨󥙫𺀋테𔱈󴓭󩲏􀈺󝴗񡾮򯓁𺃗񛩢𹕪󳽤򜳢𞄲􇭍񟜈󃿉񋛝񉰈񞝇𔫽򣠓𪡼󑟃🨄 63} true

… and there you go; you have a mechanism to create two different kinds of Person.

Boilerplate Heavy

If you read the top section, it becomes abundantly clear how boilerplate-heavy the generation mechanism is for divergent behavior. Keep that in mind, and favor simplicity. The above example is contrived but mentioned solely to show how messy things could become.

Potential Pollution of Public API

Even worse, in my opinion, is this: providing custom value generation can unintentionally pollute a public API. What does this mean? If your type is exported, an extra Generate(*rand.Rand, int) reflect.Value appears in the type’s method set manifest, much like Buffer.Next appears in the go doc manifest in the bytes package. This is problematic for public library code.

You should ask yourself the following questions:

  • Is this fundamentally desirable?
  • Do users really need a generic generation mechanism for this type; and would they always want yours?
  • Recall that go doc documentation is one of the first things that Go developers look at when studying a prospective API.
    • Ask yourself whether having the Generate method included in the list aids or inhibits the user’s comprehension of the type.
    • There should be a high burden to prove that an expansion to the public API benefits the user.

Generating Protocol Buffer Types

Many of you may work with Protocol Buffers (e.g., through first-party goprotobuf). If you do, I would like to share some special findings with you, because the generation of Protocol Buffers is a little tricky. Let’s turn to a generated type from Prometheus, a project of mine, called Counter:

message Counter {
  optional double value = 1;
}

When protoc generates the Go data transfer objects, it includes—at this time of writing—extra fields in the generated struct. Behold:

type Counter struct {
 Value            *float64 `protobuf:"fixed64,1,opt,name=value" json:"value,omitempty"`
 XXX_unrecognized []byte   `json:"-"`
}

It turns out that top-level Protocol Buffer Messages and their child fields that are backed by Messages can contain additional private metadata about the message:

These can trip up the conventional mechanism for comparing Protocol Buffer equality: proto.Equal, which you may want to perform in tests, particularly if you are interested in performing an invariant test to ensure Input → [Transformation] → [Reverse Transformation] == Input, as I needed with my record-delimited streaming Protocol Buffer support library: pbutil.

So why go into this minutiae about Protocol Buffers? If you are using quick.Generate, the generator function has no way of knowing whether to populate the XXX_ fields or what data to put in them.

What I did was create a helper called pbtest.SanitizeGenerated that recursively visits and unsets any of these internal fields, since I was not using the advanced feature set that necessitated these in the first place. It may be helpful for you. You can see it in action in this test here: pbutil.TestFuzz. The call to pbtest.SanitizeGenerated occurs here, so you can see how and why we care about it.

Pillar II: Fuzz Testing

Congratulations if you have made it this far! I hope the value generation section did not put you to sleep! It’s important to know, because you’ll use everything it mentioned above! You should be familiar with table-driven tests, an interesting phenomenon of the Go ecosystem. These techniques described below should not replace table or traditional tests but rather augment them side-by-side in the right circumstances.

The crux of this super section is the invariant. It can be summarized as follows: for a given background context and set of actions, a set of inviolable truths must be upheld as a consequence. For instance, if my teeth are dirty and I brush my teeth, my teeth will be clean afterwards—assuming we ignore stupid details like whether I know how to brush my teeth, my teeth are not chronically damaged, stained, etc.

Background Context Set of Actions Set of Inviolable Truths
My Dirty Teeth Brushing my Teeth Teeth are Clean

It’s that simple. If you look at my example above with the Protocol Buffer, you can classify it as this:

Background Context Set of Actions Set of Inviolable Truths
Empty Binary Message Stream Writing a Message (input); Extracting a Message (output) Input Is Exactly Equal to Output

The astute reader may ask, "isn’t this fundamentally in the same family as standard tests (esp. table)?" Yes. Here’s where the difference lies: the goal of fuzz testing is to ensure that a set of invariants are upheld for a numerous body of inputs. To contrast, standard tests, especially table-driven ones, usually seek to ensure that a set of invariants are upheld for a well-known bundle of background contexts and actions.

This type of testing may be useful if you are interested in …

  • seeing what happens when a monkey interacts with your code;
  • quick and lazy smoke tests that you generally want to ensure to pass for a wide selection of cases;
  • validating of formal proofs; and
  • testing certain classes of pure functions.

Equality Testing

Let’s start with equality testing. What is it? If you took a formal logic course at any time in your life, we can distill it like this:

∀(x0, x1, … xn) f(x0, x1, … xn) == g(x0, x1, … xn)

For each parameter x0, x1, … xn and each possible value of them, functions f and g are equivalent. There is only one invariant involved, which can describe as such, and it is validated on the completion of both f and g’s execution:

Background Context Set of Actions Set of Inviolable Truths
Two Functions with Initialized to Same State with Identical Signatures Execution of Each Function with Identical Input Output of f() and g() Are Equivalent

Where might you want to use this? You may wish to ensure that a change candidate for some legacy code performs exactly equivalently the previous when prototyping or refactoring.

Let’s dissect what is involved. We …

  • build two functions that have identical signatures;
  • choose types in the parameter signature that are generatable subject to the rules above.

Now how do we do this? testing.CheckEqual is the function we are looking for. I am going to cite is docstring below, and I think it’ll become apparent to you why I wrote this whole article now:

CheckEqual looks for an input on which f and g return different results. It calls f and g repeatedly with arbitrary values for each argument. If f and g return different answers, CheckEqual returns a *CheckEqualError describing the input and the outputs.

Here’s what this does, using everything we have discussed up to now with the same terminology: CheckEqual uses value generation to create identical argument lists for functions f() and g() (it does this through runtime reflection by invoking quick.Value on each argument: x0, x1, … xn), executes each function independently, and reports whether the invariant that their return values are equivalent is met. The part that tripped me up when reading the description four years ago was the “arbitrary values” bit. I suppose if I were smarter, I would have known "arbitrary values" to have meant, "we use reflection to actually generate values for you on-demand."

Here is what the simplest example could look like:

package ex

import (
 "testing"
 "testing/quick"
)

func f() bool { return true }
func g() bool { return true }

func TestEquivalence(t *testing.T) {
 // execute f(…) and g(…) M times, and compare the output after each execution.
 if err := quick.CheckEqual(f, g, /* configuration */ nil); err != nil {
  t.Error(err)
 }
}
(Source)

It does nothing, but you now see how it is put together now from a boilerplate perspective, which now lets you focus on the meat of the task at hand. Let’s turn to a contrived but illustrative example: joining strings by some delimiter.

package ex

import (
 "strings"
 "testing"
 "testing/quick"
)

// Don’t do this at home, kids.
func MyJoin(parts []string, delim string) (out string) {
 for i, part := range parts {
  if i != 0 {
   out = out + delim + part
  } else {
   out = part
  }
 }
 return out
}

func TestEquivalence(t *testing.T) {
 // execute f(…) and g(…) M times, and compare the output after each execution.
 f, g := strings.Join, MyJoin
 if err := quick.CheckEqual(f, g /* configuration */, nil); err != nil {
  t.Error(err)
 }
}
(Source)

… and that’s how you do it. The underlying mechanism used to compare the output of f and g is reflect.DeepEqual. This works well for the majority of cases but breaks down in others. I encourage you to read the Cases When Not to Use reflect.DeepEqual section of another article to understand when it may break down.

Another hint worth being aware of is this: if your function has an argument in its signature that is ungenerateable, you can always wrap it in an inner anonymous function that offloads the heavy lifting to quick.Value:

package ex

import (
 "testing"
 "testing/quick"
)

func TestTransform(t *testing.T) {
 f := func(input []string) {
  ch := make(chan string, 1) // ch is "ungenerateable"
  existingTransform(input, ch)
 }
 g := func(input []string) {
  ch := make(chan string, 1) // ch is "ungenerateable"
  newTransform(input, ch)
 }
 if err := quick.CheckEqual(f, g, nil); err != nil {
  t.Fatal(err)
 }

}

// Prototype Stubs
func existingTransform(input []string, dest chan *string) {} // Old API
func proposedTransform(input []string, dest chan *string) {} // New API

quick.Check: Probabilism and Multi-Invariant Testing

The second package function worth being aware of is quick.Check, as it is immensely more powerful. It is useful in the following cases:

  • testing something that lacks a reference implementation to compare something to;

    This is to say that you have a decent idea of what the results (i.e., the inviolable truths) from the operation should be for any input.

  • testing probabilistic systems (like my friend Sean Treadway’s quantile package); and

    Estimations and approximations are great candidates for this type of testing, especially if they have concrete parameters to say, for instance, error rate may be at most ε under any circumstance.

  • performing non-equality or multiple validations for a single operation.

    For as powerful as quick.CheckEqual is, you can see how quickly its limitations appear.

Here is what its skeleton looks like:

package ex

import (
 "testing"
 "testing/quick"
)

func TestSkeleton(t *testing.T) {
 scenario := func( /* inputs */ ) bool {
  // true: all invariants pass
  // false: at least one invariant was violated
  return true
 }
 if err := quick.Check(scenario /* configuration */, nil); err != nil {
  t.Fatal(err)
 }
}
(Source)

Like quick.CheckEqual, the function signature’s parameters are automatically generated using quick.Value under the hood. The core difference to note: return signature of the function must always be unary and return a bool, which indicates whether all invariants pass. Here is a more realistic example, one where we pretend that strings.Repeat does not exist and we naively implement it ourselves:

package ex

import (
 "bytes"
 "fmt"
 "testing"
 "testing/quick"
)

func Repeat(fragment string, n int) string {
 var buf bytes.Buffer
 for i := 0; i < n; i++ {
  fmt.Fprint(&buf, fragment)
 }
 return buf.String()
}

func TestRepeat(t *testing.T) {
 scenario := func(fragment string, n uint8) bool {
  var (
   times = int(n) // use the typesystem to coerce positive integer range.
   out   = Repeat(fragment, times)
  )
  // first invariant
  if got, want := len(out), len(fragment)*times; got != want {
   return false
  }
  // second invariant
  for i := 0; i < len(out); i += len(fragment) {
   if got, want := out[i:i+len(fragment)], fragment; got != want {
    return false
   }
  }
  return true
 }
 if err := (quick.Check(scenario, nil)); err != nil {
  t.Fatal(err)
 }
}
(Source)

Pretty cool!

Best Practice: Ensuring that Errors are Comprehensible

One of the best investments you can make when authoring a test scenario for quick.Check is liberally sprinkling testing.T.Error and testing.T.Fatal where you perform your invariant tests and a case fails. Here is what a vanilla failure looks like without any annotation from you:

=== RUN TestOrdering
--- FAIL: TestOrdering (0.00s)
 ex_test.go:48: #1: failed on input []int{-2352281900722994752, 1438442655375607923, -4110452567888190110, -1221292455668011702, -1941700286034261841, -2836753127140407751, 1432686216250034552, 3663244026151507025, -3068113732684750258, -1949953187327444488, 3713374280993588804, 3226153669854871355, -2093273755080502606, 1006087192578600616, -2272122301622271655, 2533238229511593671, -4450454445568858273, 2647789901083530435, 2761419461769776844, -1324397441074946198, -680758138988210958, 94468846694902125, -2394093124890745254, -2682139311758778198}
FAIL
exit status 1
FAIL _/tmp 0.011s

That does not look super useful! Let’s compare it with what a useful test scenario looks like. I’ve used my friend Ric Szopa’s skiplist implementation as an example for this, since it has opportunities for multiple invariants and it is probabilistic.

package ex

import (
 "sort"
 "testing"
 "testing/quick"

 "github.com/ryszard/goskiplist/skiplist"
)

func TestOrdering(t *testing.T) {
 scenario := func(in []int) bool {
  var (
   found     int
   no        = len(in)
   reference = make([]int, no)
   skiplist  = skiplist.NewIntSet()
  )
  copy(reference, in)
  sort.Ints(reference)

  for _, v := range in {
   skiplist.Add(v)
  }
  for it := skiplist.Iterator(); it.Next(); {
   got := it.Key().(int)
   // first invariant
   if found > no {
    t.Fatal("skiplist contains more items than were given as input")
    return false
   }
   // second invariant
   if want := reference[found]; got != want {
    t.Fatalf("skiplist at %d got %d, want %d", found, got, want)
    return false
   }
   found++
  }
  // third invariant
  if found < no {
   t.Fatalf("skiplist had insufficient elements: got %d, want %d", found, no)
   return false
  }

  return true
 }
 if err := quick.Check(scenario /* configuration */, nil); err != nil {
  t.Fatal(err)
 }
}
(Source)

Should a failure occur, it is abundantly easy to determine where the violated invariant occurs! Maintainers and your future self will thank you!

Configuration

Most users should be able to live with the default quick.Config by setting it nil for quick.Check and quick.CheckEqual.

Test Sizes and Iteration Cycles

The astute reader has probably started to wonder: with all of this value generation stuff, shouldn’t I care about how many values are being generated for unsized types along with how many test iterations are performed? I left the configuration nil, since the default suffices for explanatory cases. It’s up to you, though, but here’s the breakdown of facts as of the time of writing this article:

Type quick.Config is used to configure the behavior of quick.Check and quick.CheckEqual only, which means you are on your own for size hints if you are using quick.Value on its own. If you want to control the number of iterations per test, you have two choices: explicitly set quick.Config.MaxCount per test or set the quickchecks command line flag on your test.

Sadly sizes for maps and slices are unhintable.

Best Practice: Test Repeatability and Randomization

If you are using quick.Check and quick.CheckEqual, you will be pleased to know that testing/quick uses an isolated the random generator for each test case’s execution (after setting it to a known state). You will want to do this same if you decide to use quick.Value directly without these two functions. This touches on the isolation topic mentioned below. The reason why you should care: if you have a singular test failure, you want to be able to run the individual test case that is responsible for the failure and not the entire kitchen sink. If your test relied on a side-effect of an external test (e.g., modifying state of a random number generator), your replayability is destroyed!

Traps and Pitfalls

Making tests useful is an important topic, one to which countless text has been written. Suffice it to say, there are a few topics to be especially mindful of with testing/quick.

Ensuring Isolation between Executions and Test Cases

If the functions under test are not pure, you must ensure that none of their side-effects bleed over into other test cases or subsequent iterations! This is especially true for quick.CheckEqual, because it does not duplicate the generated argument list when calling function f and g, meaning either’s mutation to any value in that argument list (or external state) would result in inconsistencies! Let’s make an example by implementing our own, clever whiz-bang sorting algorithm and have it compete against sort.Sort:

package ex

import (
 "sort"
 "testing"
 "testing/quick"
)

func BubbleSort(data sort.Interface) {
outer:
 for {
  var swapped bool
  for i := 1; i < data.Len(); i++ {
   if data.Less(i, i-1) {
    data.Swap(i-1, i)
    swapped = true
   }
  }
  if !swapped {
   break outer
  }
 }
}

func TestSort(t *testing.T) {
 if err := quick.CheckEqual(
  func(v sort.IntSlice) sort.IntSlice {
   out := make(sort.IntSlice, len(v)) // Defensive copy!
   copy(out, v)
   BubbleSort(out)
   return out
  },
  func(v sort.IntSlice) sort.IntSlice {
   out := make(sort.IntSlice, len(v)) // Defensive copy!
   copy(out, v)
   sort.Sort(out)
   return out
  },
  nil); err != nil {
  t.Fatal(err)
 }
}
(Source)

Like it or not, sort.Sort modifies sort.Interface in-place. Had we not taken the defensive copy, the function output could be incorrect due to corruption or give you false positives.

Cleaning Up after Execution

As a matter of stewardship (esp. for maintainers), your tests should gracefully return the test environment, system resources, external state, you-name-it back to their original condition. Not doing so, aside from hampering with the isolation mentioned above, creates massive overhead when trying to dig through to determine why a test failed (e.g., digging into /tmp directories littered with crufty artifacts).

Consider using a pattern like this

func f() {
  defer func() { /* Clean up f. */ }()

  // Perform work.
}

func g() {
  defer func() { /* Clean up g. */ }()

  // Perform work.
}

for quick.CheckEqual and

func scenario() {
 defer func() { /* Clean up scenario. */ }()

  // Perform work.
}

for quick.Check.

Closing Thoughts

testing/quick is an awesome facility: both for blackbox testing and other value generation purposes. Don’t abuse it, but treat it as an extra tool in your belt. Several reflections from using this facility quite extensively:

  • It is a great way to augment existing tests to find unanticipated interactions or corner cases.
  • Its documentation and examples deserve a refresh to make them friendlier for other users. Considering the package’s age, it is a travesty that I had to write an article on it at all!
  • Its API has a few rough edges that I am not in love with: outright panics and false return values for failure conditions as opposed to reporting usable error strings to the user. I guess this will need to wait until a new major Go release to preserve API compatibility.

In a followup article, I would like to focus on the topics of using the type system to better generate blackbox tests along with authoring monkey tests (writing that almost made me shudder from my days of using Visual Test to write automated tests for hospital software that draws directly to the Windows GDI, which was a real slog to test because of old Microsoft Window’s non-deterministic font rendering). Hopefully that can happen sooner versus later, since this article had sat as a draft in Google Drive for the better part of ten months!

Comments, questions, and feedback welcome!

Also available as a Go Presentation: http://mtp-present.appspot.com/2015/testingquick.slide.

follow us in feedly

2015-05-21

Bootstrapping the OpenBSD Bootloader with GRUB 2.0

This posting is mostly for posterity and to help out others should they be interested in dual-booting
OpenBSD by way of GRUB.

I recently purchased an Intel NUC NUC5i7RYH as a lightweight development machine. I do most of my development and work in OpenBSD, but occasionally I need to use something that unfortunately won't run on OpenBSD, so I have placed Linux as a secondary boot option on machine. My understanding of the OpenBSD bootloader, in spite of how simple it is, is rather limited.

So how does one get GRUB 2.0 to boot OpenBSD? Let's first look at my desk setup.

Command (m for help): p

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: dos
Disk identifier: 0x000868f2

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048  1292916735   646457344   a6  OpenBSD
/dev/sda2   *  1292916736  1953519615   330301440   83  Linux

I have one large disk, which I have done lazy job of partitioning. The vast majority of the disk is consumed by the first partition sda1; this is where OpenBSD resides. Linux as is quite obvious is on sda2. This disk is setup with the traditional master boot record partition table, so let's talk about the glue.


GRUB is installed on the master boot record, which resides at the beginning of the sda block device. The OpenBSD boot loader is installed at the beginning of sda1.


Let's turn to the GRUB configuration now. My Linux distribution is CentOS, so that may result in some variance when applying this technique to another distribution. The file /etc/grub.d/40_custom is the standard location for for installing site-specific modifications to GRUB:


$ cat /etc/grub.d/40_custom 
#!/bin/sh
exec tail -n +3 $0
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.

menuentry "OpenBSD" {
  set root=(hd0,msdos1)
  kopenbsd /bsd -r sd0a
}

Unlike previous versions of GRUB, using the chainloader no longer works. It turns out GRUB 2.0 offers some ready-made facilities for certain runtime types, like the BSD family. The crux of this is the kopenbsd directive, which tells GRUB to directly load the kernel image for my OpenBSD installation. You can also use this facility, I believe, to bootstrap the OpenBSD installation ramdisk. After this modification is made, the GRUB configuration is rebuilt using some run-parts incantation (as I get older, I like these things less and less):


# grub2-mkconfig -o /boot/grub2/grub.cfg 

Et voila!


follow us in feedly

2015-04-04

Appendix of Cgo and Go Type Mappings

I was working Cgo and wanted an equivalence chart for the C primitive types for reference. Since none exists, this is what I came up with for amd64 Linux.

There are a few caveats to be aware of:

  • Some of the aliases have sequence numbers associated with them to deduplicate their type names within the package namespace in which they were generated. This is to say, carefully understand what is expressed here before using verbatim.
  • If you are reading this page, you are presumed to understand struct alignment and padding.
  • This list was generated mostly through an automated means, with some by-hand reconcilation in places. Please do not hesitate to report any errors or typos (even though I have done a once-over already)!

The source that powers this listing is found here at github.com/matttproud/cgotypes. It is worth running the go tool cgo tool to get the intermediate output contained in the _obj directory:

$ go tool cgo main.go
$ ls _obj

Namely _obj/_cgo_gotypes.go will be of interest!


C TypeGo KindCgo AliasGo Fundamental
charint8_Ctype_charint8
signed charint8_Ctype_scharint8
unsigned charuint8_Ctype_unsignedcharuint8
shortint16_Ctype_shortint16
short intint16_Ctype_shortint16
signed shortint16_Ctype_shortint16
signed short intint16_Ctype_shortint16
unsigned shortuint16_Ctype_unsignedshortuint16
unsigned short intuint16_Ctype_unsignedshortuint16
intint32_Ctype_intint32
signed intint32_Ctype_intint32
unsigneduint32_Ctype_uintuint32
unsigned intuint32_Ctype_uintuint32
longint64_Ctype_longint64
long intint64_Ctype_longint64
signed longint64_Ctype_longint64
signed long intint64_Ctype_longint64
unsigned longuint64_Ctype_ulonguint64
unsigned long intuint64_Ctype_ulonguint64
long longint64_Ctype_longlongint64
long long intint64_Ctype_longlongint64
signed long longint64_Ctype_longlongint64
signed long long intint64_Ctype_longlongint64
floatfloat32_Ctype_floatfloat32
doublefloat64_Ctype_doublefloat64
long doublen/an/aunconvertable
boolbool_Ctype__Boolbool
unionarray[8]uint8the size directly reflects the underlying contents: [8]byte
structstruct_Ctype_struct___3struct {int; int64}
char foo[1]array[1]_Ctype_char[1]int8
signed char foo[1]array[1]_Ctype_schar[1]int8
unsigned char foo[1]array[1]_Ctype_unsignedchar[1]uint8
short foo[1]array[1]_Ctype_short[1]int16
short int foo[1]array[1]_Ctype_short[1]int16
signed short foo[1]array[1]_Ctype_short[1]int16
signed short int foo[1]array[1]_Ctype_short[1]int16
unsigned short foo[1]array[1]_Ctype_unsignedshort[1]uint16
unsigned short int foo[1]array[1]_Ctype_unsignedshort[1]uint16
int foo[1]array[1]_Ctype_int[1]int32
signed int foo[1]array[1]_Ctype_int[1]int32
unsigned foo[1]array[1]_Ctype_uint[1]uint32
unsigned int foo[1]array[1]_Ctype_uint[1]uint32
long foo[1]array[1]_Ctype_long[1]int64
long int foo[1]array[1]_Ctype_long[1]int64
signed long foo[1]array[1]_Ctype_long[1]int64
signed long int foo[1]array[1]_Ctype_long[1]int64
unsigned long foo[1]array[1]_Ctype_ulong[1]uint64
unsigned long int foo[1]array[1]_Ctype_ulong[1]uint64
long long foo[1]array[1]_Ctype_longlong[1]int64
long long int foo[1]array[1]_Ctype_longlong[1]int64
signed long long foo[1]array[1]_Ctype_longlong[1]int64
signed long long int foo[1]array[1]_Ctype_longlong[1]int64
float foo[1]array[1]_Ctype_float[1]float32
double foo[1]array[1]_Ctype_double[1]double
long double foo[1]n/an/aunconvertable
bool foo[1]array[1]_Ctype__Bool[1]bool
union foo[1]array[1][8]uint8the size directly reflects the underlying contents: [1][8]byte
struct foo[1]array[1]_Ctype_struct___5[1]struct{int32; int64}
char *fooptr*_Ctype_char*int8
signed char *fooptr*_Ctype_schar*int8
unsigned char *fooptr*_Ctype_unsignedchar*uint8
short *fooptr*_Ctype_short*int16
short int *fooptr*_Ctype_short*int16
signed short *fooptr*_Ctype_short*int16
signed short int *fooptr*_Ctype_short*int16
unsigned short *fooptr*Ctype_unsignedshort*uint16
unsigned short int *fooptr*Ctype_unsignedshort*uint16
int *fooptr*_Ctype_int*int32
signed int *fooptr*_Ctype_int*int32
unsigned *fooptr*_Ctype_uint*uint32
unsigned int *fooptr*_Ctype_uint*uint32
long *fooptr*_Ctype_long*int64
long int *fooptr*_Ctype_long*int64
signed long *fooptr*_Ctype_long*int64
signed long int *fooptr*_Ctype_long*int64
unsigned long *fooptr*_Ctype_ulong*uint64
unsigned long int *fooptr*_Ctype_ulong*uint64
long long *fooptr*_Ctype_longlong*int64
long long int *fooptr*_Ctype_longlong*int64
signed long long *fooptr*_Ctype_longlong*int64
signed long long int *fooptr*_Ctype_longlong*int64
float *fooptr*_Ctype_float*float32
double *fooptr*_Ctype_double*float64
long double *foon/an/aunconvertable
bool *fooptr*_Ctype__Bool*bool
union *fooptr*[8]uint8the size directly reflects the underlying contents: *[8]byte
struct *fooptr*_Ctype_struct___7*struct{int32; int64}
func *fooptr*[0]uint8*[0]byte
void ptrunsafe.Pointerunsafe.Pointerunsafe.Pointer
char foo[]array_Ctype_ary_char[0]int8
signed_char foo[]array_Ctype_ary_signed_char[0]int8
unsigned_char foo[]array_Ctype_ary_unsigned_char[0]uint8
short foo[]array_Ctype_ary_short[0]int16
short int foo[]array_Ctype_ary_short_int[0]int16
signed short foo[]array_Ctype_ary_signed_short[0]int16
signed short int foo[]array_Ctype_ary_signed_short_int[0]int16
unsigned short foo[]array_Ctype_ary_unsigned_short[0]uint16
unsigned short int foo[]array_Ctype_ary_unsigned_short_int[0]uint16
int foo[]array_Ctype_ary_int[0]int32
signed int foo[]array_Ctype_ary_signed_int[0]int32
unsigned foo[]array_Ctype_ary_unsigned[0]uint32
unsigned int foo[]array_Ctype_ary_unsigned_int[0]uint32
long foo[]array_Ctype_ary_long[0]int64
long int foo[]array_Ctype_ary_long_int[0]int64
signed long foo[]array_Ctype_ary_signed_long[0]int64
signed long int foo[]array_Ctype_ary_signed_long_int[0]int64
unsigned long foo[]array_Ctype_ary_unsigned_long[0]uint64
unsigned long int foo[]array_Ctype_ary_unsigned_long_int[0]uint64
long long foo[]array_Ctype_ary_long_long[0]int64
long long int foo[]array_Ctype_ary_long_long_int[0]int64
signed long long foo[]array_Ctype_ary_signed_long_long[0]int64
signed long long int foo[]array_Ctype_ary_signed_long_long_int[0]int64
float foo[]array_Ctype_ary_float[0]float32
double foo[]array_Ctype_ary_double[0]float64
long double[]n/an/aunconvertable
bool foo[]array_Ctype_ary_bool[0]bool
union foo[]array_Ctype_ary_unionthe size directly reflects the underlying contents: [0][8]byte
struct foo[]array_Ctype_ary_struct[0]struct{int32; int64}

follow us in feedly

2015-02-25

Exploring Go's Runtime - How a Process Bootstraps Itself - Part I

I’ve spent a considerable portion of my professional life dealing with Java's HotSpot virtual machine and decided after acquiring a strong affinity for Go that I should attempt to build as deep of an understanding of the Go runtime as well. Much to my surprise, there is little formal documentation—outside of the source, so the opportunity seems ripe to consolidate this in an accessible digest.


Let me speak briefly about the format of this post:

  • This will be a multi-part series with individual posts dedicated to discrete topics. It is unclear to me how many parts there will be, but I anticipate between two to three.
  • For the purposes of this series, I will stick to the source found in Go 1.4.1. The runtime variant under examination is GNU Linux on amd64/x86-64 virtual machine.
  • Where possible, I reference source code in Go's runtime as well as a custom-built sandbox to demonstrate the concepts. The sandbox lives here but is not strictly necessary to follow along. This first post does not use the sandbox heavily, but other posts may!

Without further ado, let’s begin Part I of this series: How a Process Bootstraps Itself. We’ll start out with the entry point of a Go binary. This task is critical for tracing the flow of execution to discover how a Go binary bootstraps itself. If you want, you can clone the sandbox and create a local copy of it for self-study:


$ git clone https://github.com/matttproud/golang_runtime_exploration
$ cd golang_runtime_exploration
$ make  # output elided

We turn ourselves to a newly-generated file called entrypoint/entrypoint.disassembled I generated from a vanilla Go program entrypoint/entrypoint.go using the objdump tool, which states


start address 0x0000000000421760

in the prologue. This is the entry point into the generated Go binary. What does this mean, and where does it come from? Since we're running on Linux, and Linux uses the ELF executable format these days, Go’s linker is obligated to define an entrypoint offset in its executable file emissions. Specifically, Go’s linker defines this offset in the entry member of the Elf32_Ehdr structure. A call inside the linker to entryvalue() populates this member from the static INITVALUE variable, which is set by interpolating a format string _rt0_%s_%s with GOARCH and GOOS constants. Now that we know where and how this entrypoint comes from, let’s return to what it means. Thusly the entrypoint.disassembled objdump at offset 421760 looks something like this (again: from objdump):


0000000000421760 <_rt0_amd64_linux>:
  421760:       48 8d 74 24 08          lea    0x8(%rsp),%rsi
  421765:       48 8b 3c 24             mov    (%rsp),%rdi
  421769:       b8 70 17 42 00          mov    $0x421770,%eax
  42176e:       ff e0                   jmpq   *%rax

This is the body of machine code that is executed when the execve call starts the binary.


Great, now how do I read this, you might ask? I won't pull your leg by saying that I'm a wonk of Assembler, but I will say that we’re in luck with mature, documented prior art: standardized process bootstrapping mechanisms, like the one that glibc defines (cf., i386 implementation). The Ksplice team at Oracle has even produced a nice writeup of their mechanics if you are so curious. The good news is instead of reading raw disassembled output, we can consult Go’s intermediate Assembler format and the canonical source in upstream since the format string above tells us to look at _rt0_amd64_linux. Bingo! The _rt0_amd64_linux symbol is defined in rt0_linux_amd64.s. Let’s trace out the bootstrapping process, focusing on the unique particulars of Go. Here’s that entrypoint again, but this time in Go’s Assembler format:


TEXT _rt0_amd64_linux(SB),NOSPLIT,$-8
 LEAQ 8(SP), SI // argv
 MOVQ 0(SP), DI // argc
 MOVQ $main(SB), AX
 JMP AX

What it is doing here is assigning argv and argc from the stack pointer to the local registers SI and DI respectively before invoking the main procedure. Note: this procedure is not the main function in package main we defined! We’ll get to why in a later post. Rather, this main is defined in the same file as follows:


TEXT main(SB),NOSPLIT,$-8
 MOVQ $runtime·rt0_go(SB), AX
 JMP AX

The definition is pretty straight forward: execute the rt0_go procedure found in package runtime. You may ask, why the boilerplate of having the entrypoint just invoke an simple shell of a pain procedure, which, in turn, just delegates itself to the rt0_go procedure? The answer is that each GOARCH and GOOS has different initial bootstrapping dependencies and requirements (cf., Linux on ARM). At this point, rt0_go is assumed to be safely generic for all GOOS variants since it is defined in asm_amd64.s. Let’s continue the tracing by looking at this procedure closely:


// copy arguments forward on an even stack
 MOVQ DI, AX  // argc
 MOVQ SI, BX  // argv
 SUBQ $(4*8+7), SP  // 2args 2auto
 ANDQ $~15, SP
 MOVQ AX, 16(SP)
 MOVQ BX, 24(SP)

This is responsible for setting up the stack, namely program arguments once again.


Before we look at the next segment, let’s take a quick diversion into terminology. In the world of Go’s scheduler, we have a cast of acronyms G, M, and P that require explanation. To borrow the words of Daniel Morsing (from his nice blog post on the Go scheduler ca. 1.1 release):


  • G corresponds to a Goroutine (struct G).
  • M corresponds to a Machine, which can be effectively substituted with a thread of execution within the operating system (struct M). G are executed on M.
  • P corresponds to a Processor, which can be thought of as a resource in which a M runs a G (struct P).

Now that we have that out of the way, we come to this gem:


// create istack out of the given (operating system) stack.
 // _cgo_init may update stackguard.
 MOVQ $runtime·g0(SB), DI
 LEAQ (-64*1024+104)(SP), BX
 MOVQ BX, g_stackguard0(DI)
 MOVQ BX, g_stackguard1(DI)
 MOVQ BX, (g_stack+stack_lo)(DI)
 MOVQ SP, (g_stack+stack_hi)(DI)

Using what we have learned above, we can infer that g0 is the 0th Goroutine of the system and possibly performs various runtime management functions. It is defined by the struct G and has some interesting fields, a few listed here:


Stack stack; // offset known to runtime/cgo 
uintptr stackguard0; // offset known to liblink
uintptr stackguard1; // offset known to liblink

Given the Go Assembler guide, the assembly listing, and the struct G excerpts, we can interpret the istack assembly to mean the following:


  • Assign the address of g0 to DI, which will be used to scope the subsequent references.
  • Assign the address literal of -0xff98 (-64*1024+104 in hex) from the stack pointer’s origin to BX. If someone has a good idea on where this value originates, clarification would be appreciated.
  • Assign BX to g0.stackguard0 and g0.stackguard1. In the context of Go, stackguard relates to the logic behind the stack resizing methodology and when more memory is allocated. This may become a topic for a future post but is described to some detail in the Contiguous Stacks Design Document.
  • Assign BX to g0.stack.lo and the stack pointer to g0.stack.hi.

… and there we have the geometry of g0’s stack. I presume a methodology similar to this is used in creation of additional Goroutines once the scheduler has started, but I haven’t verified yet.


Continuing, we hit the CPU type and capabilities detection procedure:


// find out information about the processor we're on
 MOVQ $0, AX
 CPUID
 CMPQ AX, $0
 JE nocpuinfo
 MOVQ $1, AX
 CPUID
 MOVL CX, runtime·cpuid_ecx(SB)
 MOVL DX, runtime·cpuid_edx(SB)

If the information is available, the feature set is recorded in cpuid_ecx and cpuid_edx. A quick survey reports that the ECX is used in selecting the fundamental hashing algorithm that the runtime uses internally:


if (cpuid_ecx&(1<<25)) != 0 && // aes (aesenc)
  (cpuid_ecx&(1<<9)) != 0 && // sse3 (pshufb)
  (cpuid_ecx&(1<<19)) != 0 { // sse4.1 (pinsr{d,q})
  useAeshash = true
  algarray[alg_MEM].hash = aeshash
  algarray[alg_MEM8].hash = aeshash
  algarray[alg_MEM16].hash = aeshash
  algarray[alg_MEM32].hash = aeshash32
  algarray[alg_MEM64].hash = aeshash64
  algarray[alg_MEM128].hash = aeshash
  algarray[alg_STRING].hash = aeshashstr

These are used, for instance, when indexing a map type, which is to say that the runtime attempts to select the most optimal hashing methodology given the detected architecture, which could be AES. As for EDX, its value informs whether SSE2 is present and thusly the mechanism used for the runtime’s memclr and memmove procedures.

You can see from the hashing algorithms discussion above that a number of Go’s core runtime features and behaviors are determined by function pointer. The same is the case with Cgo, which immediately follows the CPU capabilities detection.

_cgo_init is a function pointer, defined as such:

// Filled in by dynamic linker when Cgo is available.
void (*_cgo_init)(void);

If Cgo is bundled into the build artifact, the function pointer is set:

extern void x_cgo_init(G*);
void (*_cgo_init)(G*) = x_cgo_init;

x_cgo_init is defined as follows:

void
x_cgo_init(G* g, void (*setg)(void*))
{
 pthread_attr_t attr;
 size_t size;

 setg_gcc = setg;
 pthread_attr_init(&attr);
 pthread_attr_getstacksize(&attr, &size);
 g->stacklo = (uintptr)&attr - size + 4096;
 pthread_attr_destroy(&attr);
}

So that leaves us to an assembly block, which handles the Cgo bootstrapping:

// if there is an _cgo_init, call it.
 MOVQ _cgo_init(SB), AX
 TESTQ AX, AX
 JZ needtls
 // g0 already in DI
 MOVQ DI, CX // Win64 uses CX for first parameter
 MOVQ $setg_gcc<>(SB), SI
 CALL AX
 // update stackguard after _cgo_init
 MOVQ $runtime·g0(SB), CX
 MOVQ (g_stack+stack_lo)(CX), AX
 ADDQ $const_StackGuard, AX
 MOVQ AX, g_stackguard0(CX)
 MOVQ AX, g_stackguard1(CX)

So, if _cgo_init is non-null (TESTQ AX, AX), we prepare our function call, which means g0 as the first parameter and again another function pointer: setg_gcc. Remember: the Go assembler guide will help you read the peculiarities in here! setg_gcc on amd64 is defined as such:

// void setg_gcc(G*); set g called from gcc.
TEXT setg_gcc<>(SB),NOSPLIT,$0
 get_tls(AX)
 MOVQ DI, g(AX)
 RET

Let’s put this rat nest together. If _cgo_init is called, …

All of this is done for the system g0 Goroutine.

Irrespective of Cgo, the runtime now sets up thread local storage (TLS), which is used to scope stateful data to a given thread, or a M per the scheduler terminology. I will admit that this is entering murky territory for me.

LEAQ runtime·tls0(SB), DI
 CALL runtime·settls(SB)

tls0 refers to a byte array and is passed as the first argument to the settls procedure, which, in turn, invokes system call 158 (sys_arch_prctl) to instruct the kernel to set g0’s thread local storage to that array.

Assuming this has been successful, the runtime performs a blackbox test of TLS capability:

// store through it, to make sure it works
 get_tls(BX)
 MOVQ $0x123, g(BX)
 MOVQ runtime·tls0(SB), AX
 CMPQ AX, $0x123
 JEQ 2(PC)
 MOVL AX, 0 // abort

This merely passes the 0x123 integer literal into g0 TLS, fetches the value out of TLS, and then compares for end-to-end correctness. The Go distribution’s compiler bootstrapper (different from the process we are talking about) defines a set of macros related to TLS that are worth knowing:

"#define get_tls(r) MOVQ TLS, r\n"
  "#define g(r) 0(r)(TLS*1)\n"

If you have made it this far, I congratulate you! It’s been an arduous trek.

At this point, the runtime mutually binds g0 to m0 (m0 definition):

// set the per-goroutine and per-mach "registers"
 get_tls(BX)
 LEAQ runtime·g0(SB), CX
 MOVQ CX, g(BX)
 LEAQ runtime·m0(SB), AX
 // save m->g0 = g0
 MOVQ CX, m_g0(AX)
 // save m0 to g0->m
 MOVQ AX, g_m(CX)

One question that I have not answered yet is whether g0 and m0 are eternally bound. My assumption—to be explicitly validated later on—is whether this binding is done just to bootstrap the scheduler which, I presume to run on g0.

A set of mundane invariant checks follow, which are used to test environmental sanity:

Eventually argc and argv are persisted for later use in the process:

static int32 argc;

#pragma dataflag NOPTR /* argv not a heap pointer */
static uint8** argv;

void
runtime·args(int32 c, uint8 **v)
{
 argc = c;
 argv = v;
 if(runtime·sysargs != nil)
  runtime·sysargs(c, v);
}

The sysargs function pointer is responsible for platform-specific initializations given the arguments, and we’ll see just that. On AMD64 Linux, linux_setup_vdso is responsible for this:

void (*runtime·sysargs)(int32, uint8**) = runtime·linux_setup_vdso;

VDSO stands for Virtual Dynamic Shared Object. Go gets access to this by using the auxiliary vectors. My friend and colleague Manu Garg has a nice writeup about them; I encourage you to read it if you are unfamiliar.

The reason is to acquire initial random data samples:

if(elf_auxv[i].a_type == AT_RANDOM) {
          runtime·startup_random_data = (byte*)elf_auxv[i].a_un.a_val;
          runtime·startup_random_data_len = 16;
   continue;
  }

Who uses these samples, and where? If we trace this out, …

get_random_data piggybacks off of this body. It has only one use in the runtime, and that is in the runtime package’s init function if AES support is detected:

var rnd unsafe.Pointer
  var n int32
  get_random_data(&rnd, &n)
  if n > hashRandomBytes {
   n = hashRandomBytes
  }
  memmove(unsafe.Pointer(&aeskeysched[0]), rnd, uintptr(n))
  if n < hashRandomBytes {
   // Not very random, but better than nothing.
   for t := nanotime(); n < hashRandomBytes; n++ {
    aeskeysched[n] = byte(t >> uint(8*(n%8)))
   }
  }

The side-effects are left in aeskeysched, which is used in the AES hashing procedures aeshashbody, aeshash32, and aeshash64—the latter being provided as reference:

TEXT runtime·aeshash64(SB),NOSPLIT,$0-32
 MOVQ p+0(FP), AX // ptr to data
 // s+8(FP) is ignored, it is always sizeof(int64)
 MOVQ h+16(FP), X0 // seed
 PINSRQ $1, (AX), X0 // data
 AESENC runtime·aeskeysched+0(SB), X0
 AESENC runtime·aeskeysched+16(SB), X0
 AESENC runtime·aeskeysched+0(SB), X0
 MOVQ X0, ret+24(FP)
 RET

The AES instruction set reference may be helpful. But back to the original question of who uses this, let’s discuss that:

The answer lies in the index expressions specification for map types found in the Go Language Specification. Why? Because maps use hashes for bucketing KeyType-s. I had originally planned on writing a separate post on this topic—might still—but the cat is out of the bag now! One takeaway, as the blog post Go maps in action indicates, is that the user should not expect stable map iteration order! Even if the iteration orders appeared to be the same across runs on different processes on the same host, there is no guarantee that the order would ever be consistent on another host, especially if it possessed a different GOARCH or GOOS!

The next discussion is the last one, for the article has gotten long—and there is a dearth of more topics to cover!

The runtime then calls the operating system initialization routine. On Linux, it is trivial:

void
runtime·osinit(void)
{
 runtime·ncpu = getproccount();
}

This ncpu value is used in a few places:

I will now close out this part of the article here and save the remaining topics for a followup post or two. The big things to follow are the scheduler, memory manager, and the developer’s entrypoint of the Go programmer’s binary.

Don’t worry: I will link the posts together! Even if it takes a little bit of time for the following edition to come out (most of the research is done; just needs prosaic conversion), I expect the general ethos findings to remain true in spite of the partial rewrite of the Go runtime from C and .goc to Go. Stay tuned!

For extracurricular reading, my friend Jeremie Le Hen suggested a quick discussion on the following (included in the sandbox):

I think it would be worth showing at this point some differences between Go-generated binaries and standard binaries, which are kind of bloated nowadays:
$ ldd entrypoint
 not a dynamic executable
$ cat entrypoint.c 
int main(int argc, char *argv[])
{
  // do nothing
  return 0;
}
$ gcc -o entrypoint.c.bin -static entrypoint.c 
$ ls -l entrypoint.c.bin entrypoint
-rwxr-x--- 1 jlehen jlehen 581704 Feb 25 09:44 entrypoint
-rwxr-x--- 1 jlehen jlehen 876940 Feb 25 09:59 entrypoint.c.bin
Also the C binary has a plethora of ELF sections, whereas the Go executable has only 7 if you exclude debug sections, and 3 of them are Go-specific. This means the Go compiler is really independent from the classic C runtime, and doesn’t even use the C runtime objects (crtX.o stuff). The libc doesn’t seem to be used at all either, which is not surprising as there are not dependencies between it and the C runtime objects. This can be verified with objdump -t on the binaries:
  • on entrypoint.c.bin there is a whole lot of misc mostly-internal symbols, which originate in the standard lib;
  • on entrypoint, you see a nicely named collection of symbols, mainly divided in two big sections gc and runtime.

Now that we really are finished, I would like to say a big thank-you to Manu Garg and Jeremie Le Hen for their editorial review and contributions! Big round of applause!

follow us in feedly
 

None of the content contained herein represents the views of my employer nor should it be construed in such a manner. . All content is copyright Matt T. Proud and may not be reproduced in whole or part without expressed permission.