2014-08-31

Go Testing Patterns: Common Rendezvous Point for Concurrency

For as much as I try to design interfaces that avoid direct exposure of concurrency to the user, I find myself periodically testing said code (usually non-exported implementation). I'd like to outline a couple of practices that have worked for me:


Common Rendezvous Point

What the common rendezvous does is ensure that all participants have reached a common place in their execution and wait there until being told to resume. Suppose we a system under test (SUT) that involves Routine A, Routine B, and Routine C, and the SUT expects them all to be at a common point P between each of them before continuing to the core activity that produces the side effects we want to check.


(Larger Version)

How we we best achieve this in Go? The pkg/sync provides a great place to start, specifically the WaitGroup type, which provides a barrier similar to the Java Standard Library's CountDownLatch. Let's get started by modeling the SUT and the preparing components for the routines!


package rendezvous

import (
        "fmt"
        "sync"
        "testing"
)

func TestSystem(t *testing.T) {
        var prep sync.WaitGroup // The preparation rendezvous point.
        prep.Add(3)         // We have three participants.

        go func() {
                fmt.Println("Routine A starting ...")
                fmt.Println("Routine A preparing ...")
                // Do expensive preparation.
                fmt.Println("Routine A finished preparation.")
                prep.Done() // Signal completion.
                fmt.Println("Routine A received signal to proceed.")
                // TODO: Flesh out the rest.
        }()
        go func() {
                fmt.Println("Routine B starting ...")
                fmt.Println("Routine B preparing ...")
                // Do expensive preparation.
                fmt.Println("Routine B finished preparation.")
                prep.Done() // Signal completion.
                fmt.Println("Routine B received signal to proceed.")
                // TODO: Flesh out the rest.
        }()
        go func() {
                fmt.Println("Routine C starting ...")
                fmt.Println("Routine C preparing ...")
                // Do expensive preparation.
                fmt.Println("Routine C finished preparation.")
                prep.Done() // Signal completion.
                fmt.Println("Routine C received signal to proceed.")
                // TODO: Flesh out the rest.
        }()

        select {}  // Wait infinitely.  This aborts the runtime.  See later steps.
}
(On Go Playground)

It's time to reflect on what's going on under the hood. Let's start with some obvious questions.

Why did we use a WaitGroup as opposed to a chan? We could have created three separate channels that all routines shared but only one closed similar to this fragment:

// Outer scope
routA := make(chan struct{})
routB := make(chan struct{})
routC := make(chan struct{})

// In Routine A
close(routA)
<-routB
<-routC

// In Routine B
close(routB)
<-routa
<-routC

// In Routine C
close(routC)
<-routa
<-routB

Sure, this fragment works, but look at how needlessly verbose and fragile it is. One misimplemented handler, and boom: the whole design fails—sometimes silently!

Why do we declare the prep WaitGroup as var prep sync.WaitGroup as opposed to simply prep := sync.WaitGroup{}? We are using the zero value for the underlying struct, which is readily usable for us with this type (this topic is worthy a blog post of its own). Using the VarDecl declaration style, we can chain subsequent VarSpec in the same definition for the same type. Suppose we add a second WaitGroup; which of the following is cleaner and easier to read first, second := sync.WaitGroup{}, sync.WaitGroup{} or var first, second sync.WaitGroup? We'll return to this later.

Let's continue.

package rendezvous

import (
        "fmt"
        "sync"
        "testing"
)

func TestSystem(t *testing.T) {
        var prep, fin sync.WaitGroup // The preparation rendezvous point.
        prep.Add(3)                  // We have three participants.
        fin.Add(3)                   // We have three participants.

        go func() {
                fmt.Println("Routine A starting ...")
                fmt.Println("Routine A preparing ...")
                // Do expensive preparation.
                fmt.Println("Routine A finished preparation.")
                prep.Done() // Signal completion.
                fmt.Println("Routine A received signal to proceed.")
                // Perform real work here.
                fin.Done()
                fmt.Println("Routine A exited.")
        }()
        go func() {
                fmt.Println("Routine B starting ...")
                fmt.Println("Routine B preparing ...")
                // Do expensive preparation.
                fmt.Println("Routine B finished preparation.")
                prep.Done() // Signal completion.
                fmt.Println("Routine B received signal to proceed.")
                // Perform real work here.
                fin.Done()
                fmt.Println("Routine B exited.")
        }()
        go func() {
                fmt.Println("Routine C starting ...")
                fmt.Println("Routine C preparing ...")
                // Do expensive preparation.
                fmt.Println("Routine C finished preparation.")
                prep.Done() // Signal completion.
                fmt.Println("Routine C received signal to proceed.")
                // Perform real work here.
                fin.Done()
                fmt.Println("Routine C exited.")
        }()

        fin.Wait()
        // Validate side effects.
        fmt.Println("All routines exited.")
}
(On Go Playground)

Recall the point I made above about adding a second WaitGroup? Here it is in fin, which each goroutine uses to signal its completion of its respective work. We also replaced select {} with fin.Wait().

So what does this effectively buy us? Well, if we need to be sure for purposes of testing that all participants have reached one or more common rendezvous points before either running the SUT or validating side effects, we have a graceful protocol. This ensures both correctness and determinism!

Is the example missing anything? That depends on the SUT and its needs. One candidate could be cancellation on failure or timeout. Implementing cancellation correctly depends strongly on the underlying SUT, as I implied. There is no one-size-fits all cancellation policy for every API in Go, so I leave that as an exercise to the reader. If you're interested, this article discusses how the Java API handles this problem on a thread-level. Very generally, we could create a timeout policy as such:

import "time"

// rest elided

finished := make(chan struct{})

go func(sig <- chan struct{}) {
  fin.Wait()
  close(sig)
}(finished)

select {
  case <- finished:
  break
  case time.After(5 * time.Second):
    t.Fatal("SUT timed out.")
}

Do note that this does not include proper cancellation of your routines! Creating a timeout policy as such is not strictly needed, as the standard Go pkg/testing includes a configurable timeout for the entire test run.

Also note that this posting does not seek to enable overly complicated API design nor Rube Goldberg Machine-style tests. Only with experience and wisdom will you be able to discern between knowing when it is necessary.

In the next segment, we'll talk about concurrency level in tests.

follow us in feedly

2014-08-24

Plan 9's Acid Debugger and Go

Background

I've been putting together some research on the Go Language's runtime to help folks better understand its memory manager and what impact it has on the servers they develop.  In the course of this, I found that Gdb was of limited use for working with the low-level runtime (I'll post a follow-up about Gdb in a few days):


It dawned on me that perhaps Plan 9's debugger called Acid may be the ticket home.  This supposition arose due to direct integration of Acid support in the golang cc compiler as well as perennial mentions of Acid on golang-nuts.  Here are the results from my trial, in case anyone is interested in running with this.

The Trial

$ pacman -S plan9port

$ make entrypoint

$ 9 acid entrypoint
entrypoint: linux amd64 executable
/usr/lib/plan9/acid/port
/usr/lib/plan9/acid/amd64

That'll get Acid loaded.  Let's see what's in the symbol table that contains the substring "Init".

acid; symbols("Init")
runtime.MCentral_Init T 0x00404ce0 entrypoint runtime.MCentral_Init
runtime.FixAlloc_Init T 0x00405cc0 entrypoint runtime.FixAlloc_Init
runtime.MHeap_Init T 0x0040af50 entrypoint runtime.MHeap_Init
runtime.MSpan_Init T 0x0040c520 entrypoint runtime.MSpan_Init
runtime.MSpanList_Init T 0x0040c5a0 entrypoint runtime.MSpanList_Init
runtime.InitSizes T 0x0040cfe0 entrypoint runtime.InitSizes

Cool.  At this point, what I start learning that is awesome about Acid is how extensible it is.  To wit, we can program functions in a native language as well as ask Acid to describe available functions:

acid; whatis source
defn source() {
 local l;
 l = srcpath;
 while l do
 {
  print(head l,"\n");
  l = tail l;
 }
 l = srcfiles;
 while l do
 {
  print("\t",head l,"\n");
  l = tail l;
 }
}

Pretty neat?  Let's now try to do something useful.  I've been interested in watching the TCMalloc-based size classes get built.  This is done in runtime.InitSizes at address 0x0040cfe0 per the listing above.

Let's first try to disassemble that function.

acid; asm(runtime.InitSizes)
<stdin>:7: (error) runtime used but not set

Hmm.  No dice.  Maybe if I quote it?

acid; asm("runtime.InitSizes")
<stdin>:9: (error) fnbound(addr): arg type

Still a nope.  Am I doing it wrong?  Let's try another symbol.  How about main, and this isn't the "func main() {}" that you write in your own Go servers.  The latter is called main.main.

acid; symbols("main")
main.main T 0x00400c00 entrypoint main.main
main.init T 0x00400c10 entrypoint main.init
runtime.main T 0x00410150 entrypoint runtime.main
main T 0x00422510 entrypoint main
runtime.main.f D 0x00436c48 entrypoint runtime.main.f
main.initdone. D 0x00456f60 entrypoint main.initdone.

acid; asm(main)
main 0x00422510 MOVL $_rt0_go(SB),AX
main+0x5 0x00422515 JMP* AX
main+0x7 0x00422517 ADDB AL,0(AX)
main+0x9 0x00422519 ADDB AL,0(AX)
main+0xb 0x0042251b ADDB AL,0(AX)
main+0xd 0x0042251d ADDB AL,0(AX)
main+0xf 0x0042251f ADDB CL,b808247c(BX)

Cool!  There's our output, and no quoting required.  What was happening above?  Well, much of the core runtime functions in Go are defined with Unicode · in their names, which the compilation process replaces with a period in the emitted C and Assembly code.  In effect, runtime·InitSizes maps to "runtime.InitSizes" in the output we see above.  If I had to suspect something, Acid doesn't support function names with "." and presumes this to used for struct field member referencing.  We can confirm this by swapping the address "0x0040cfe0" for "runtime.InitSizes":

acid; asm(0x0040cfe0)
runtime.InitSizes 0x0040cfe0 DECL AX
runtime.InitSizes+0x2 0x0040cfe2 MOVL fffffff0,CX
runtime.InitSizes+0x9 0x0040cfe9 DECL AX
runtime.InitSizes+0xa 0x0040cfea CMPL 0(CX),SP
runtime.InitSizes+0xc 0x0040cfec JHI runtime.InitSizes+0x15(SB)
// rest elided

Sure enough, it is a problem of naming.  Under the covers, Acid maps these names symbolically to these addresses.  Let's turn to the task of debugging a running process and setting a breakpoint for runtime.InitSizes to get that pesky size classes table:

acid; new()
<stdin>:2: (error) indir: address 0xffffffffffffffff not mapped
acid; bpset(0x0040cfe0)
Waiting...

Hmm, we're waiting.  For what—Christmas?

According to the process tree, both Acid and entrypoint are running:

root     23650  0.0  0.0  13192  3288 pts/0    S+   20:21   0:00 acid entrypoint
root     23653  0.0  0.0    608     4 pts/0    t    20:21   0:00 entrypoint

Rather specifically, entrypoint is in a state of inspection and just waiting.  I think I've had enough of this for now.  I'll give this another whirl, this time without a breakpoint:

acid; new()
<stdin>:2: (error) indir: address 0xffffffffffffffff not mapped
acid; cont()
open /proc/23679/stat: No such file or directory
<stdin>:4: (error) procnotes pid=23679: No such file or directory

Everything appears to run (I rebuilt the original binary with some logging.), but these two warnings about address ranges and procfs entry missing are disconcerting.  Maybe stepping through it will work instead of breakpoints?  Let's try it out:

acid; new()
<stdin>:2: (error) indir: address 0xffffffffffffffff not mapped
cid; step
0x0041f3a0
acid; func()
cannot locate text symbol
acid; gpr()
AX 0000000000000000 BX 0000000000000000 CX <stdin>:9: (error) reading CX: ptrace: Input/output error

The call to "func()" should have stepped "next" until the current call exited, and "gpr()" ought to have emitted register status.  Cue up the sad-trombone.

Reflections

On the surface, Acid looks pretty amazing: it feels like it fits into a similar vein as the Acme text editor in terms of extensibility and simplicity.

So, why didn't this all work with the breakpoint at the end?  It dawns on me that I was being a big dummy: I should have known better about running Plan 9 from Userspace in Linux.  Acid is probably expecting the process control semantics of Plan 9 and Plan 9 from Userspace cannot get what it needs from Linux.  Turns out that this is correct:

$ 9 man intro
          # previous elided
          The debuggers acid(1) and db(1) and the debugging library
          mach(3) are works in progress.  They are platform-
          independent, so that x86 Linux core dumps can be inspected
          on PowerPC Mac OS X machines, but they are also fairly
          incomplete.  The x86 target is the most mature; initial Pow-
          erPC support exists; and other targets are unimplemented.
          The debuggers can only inspect, not manipulate, target pro-
          cesses.  Support for operating system threads and for 64-bit

     Page 2                       Plan 9             (printed 8/24/14)

     INTRO(1)                                                 INTRO(1)

          architectures needs to be rethought.  On x86 Linux systems,
          acid and db can be relied upon to produce reasonable stack
          traces (often in cases when GNU gdb cannot) and dump data
          structures, but that it is the extent to which they have
          been developed and exercised.
          # rest elided

When I have some more time down the road, I will spin up a virtual machine with Plan 9 on it and re-try this same experiment on it.  This will be a true test of character for the question of Acid versus Gdb and which debugger fares better with Go.  In any case, the planned overhaul of the compiler and linker with a rewrite from C to pure Go may invalidate parts of this document or altogether.

Needless to say, this has whet my interest in Plan 9.

Stay tuned for the next parts of this series, where we examine the runtime with Gdb, tour the bootstrapping process and memory allocator and manager,  garbage collector, etc!




follow us in feedly
 

None of the content contained herein represents the views of my employer nor should it be construed in such a manner. . All content is copyright Matt T. Proud and may not be reproduced in whole or part without expressed permission.