Test parallelization in Go: Understanding the t.Parallel() method

* This article is a translation of the Japanese article written on August 24, 2020.

This article is for day 6 of Merpay Tech Openness Month 2020.
Hello, everyone. I’m Yoshiki Shibata (@yoshiki_shibata), a backend engineer at Merpay. In this article, I discuss the parallelization function provided in the testing package of the Go programming language (Golang).

Golang provides a package called testing, which is used to create test code. As you develop a piece of software and its scale grows, the amount of test code written also increases. This can increase the time it takes for all testing to complete. This is especially the case when testing access to a database, where communicating with the database accounts for much of the testing time. In this case, running test code in parallel rather than sequentially can reduce testing time. (The correct term is “concurrent” rather than “parallel,” but because I’m covering the t.Parallel() method here, I’ll be using “parallel” throughout the article.)
I’ll be explaining the Parallel() method in *testing.T here.

Executing tests from multiple packages in parallel

By default, execution of test code using the testing package will be done sequentially. However, note that it is only the tests within a given package that run sequentially.

If tests from multiple packages are specified, the tests will be run in parallel at the package level. For example, imagine there are two packages, package a and package b. The test code in package a will be run sequentially, and the test code in package b will also be run sequentially. However, the tests for package a and package b will be run in parallel. Let’s look more closely into how these will be run in parallel.

If multiple packages are specified (or if all packages are specified with ./...), the number of packages for which tests will be run in parallel is specified with the -p flag for the go test command (actually, a build flag). The description of the -p flag provided by go help build is shown below.

   -p n
        the number of programs, such as build commands or
        test binaries, that can be run in parallel.
        The default is the number of CPUs available.

The number of programs, such as build commands or test binaries, that can be run in parallel. The default is the number of CPUs available.

In other words, for tests, the number of processes specified with the -p flag will be the maximum number of test binaries that can be run as parallel processes. If nothing is specified with the -p flag, the maximum will be the number of CPUs. Note also that the packages to be tested will be automatically assigned to processes. In other words, each process will run tests for a single package sequentially. What happens if we specify -p=1? There would be only one process running tests, so all tests would be run sequentially, one package at a time.

Note: If you specify a value greater than 1 using the -p flag, specify multiple packages (or specify ./...), run the tests, and then execute the ps command from another terminal while the tests are running, you’ll see that test binaries are being created for each package while tests are being run.

Specifying a large value for the -p flag will generate a number of test processes equal to that number, and this will improve parallelism. However, keep in mind that this only means that tests from multiple packages will be run in parallel. It does not mean that tests within individual packages will be run in parallel. In order to improve parallelism for tests within a package, we need to use the t.Parallel() method.

t.Parallel() method

*testing.T contains a method called Parallel(). Using the t.Parallel() method can be tricky, and it’s important to have a good understanding of how to use it properly.

The description of the Parallel() method is as follows.

func (t *T) Parallel()
    Parallel signals that this test is to be run in parallel with (and only
    with) other parallel tests. When a test is run multiple times due to use of
    -test.count or -test.cpu, multiple instances of a single test never run in
    parallel with each other.

Parallel signals that this test is to be run in parallel with (and only with) other parallel tests. When a test is run multiple times due to use of -test.count or -test.cpu, multiple instances of a single test never run in parallel with each other.

Let’s look at a simple example.

Imagine we have some test code using the testing package. Within this test code is a top-level test function with the func TestXXX(t *testing.T) signature. Within this top-level test function is a subtest function written using t.Run(). Let’s start by seeing what happens when the t.Parallel() method is called only for a top-level function.

Take a look at the following code.

package main

import (
    "fmt"
    "testing"
)

func trace(name string) func() {
    fmt.Printf("%s enteredn", name)
    return func() {
        fmt.Printf("%s returnedn", name)
    }

}

func Test_Func1(t *testing.T) {
    defer trace("Test_Func1")()

    // ...
}

func Test_Func2(t *testing.T) {
    defer trace("Test_Func2")()
    t.Parallel()

    // ...
}

func Test_Func3(t *testing.T) {
    defer trace("Test_Func3")()

    // ...
}

func Test_Func4(t *testing.T) {
    defer trace("Test_Func4")()
    t.Parallel()

    // ...
}

func Test_Func5(t *testing.T) {
    defer trace("Test_Func5")()

    // ...
}

There are five test functions. Test_Func1, Test_Func3, and Test_Func5 are normal test functions. Test_Func2 and Test_Func4 call the t.Parallel() method. If we run this using the go test command, the following occurs.

  1. Test_Func1 is executed and finishes processing.
  2. Next, the program moves on to running Test_Func2. However, it pauses once the t.Parallel() method is called.
  3. With Test_Func2 execution paused, Test_Func3 is run and finishes processing.
  4. Next, the program moves on to running Test_Func4. However, it pauses once the t.Parallel() method is called.
  5. With Test_Func4 execution paused, Test_Func5 is run and finishes processing.

Once the functions that do not call the t.Parallel() method (Test_Func1, Test_Func3, and Test_Func5) are all run in order, processing of the functions that do call the t.Parallel() method (Test_Func2 and Test_Func4) is resumed in parallel, and then finishes.

The results are shown below.

=== RUN   Test_Func1
Test_Func1 entered
Test_Func1 returned                <- 1 (完了)
--- PASS: Test_Func1 (0.00s)
=== RUN   Test_Func2
Test_Func2 entered
=== PAUSE Test_Func2               <- 2 (一時停止)
=== RUN   Test_Func3
Test_Func3 entered
Test_Func3 returned                <- 3 (完了)
--- PASS: Test_Func3 (0.00s)
=== RUN   Test_Func4
Test_Func4 entered
=== PAUSE Test_Func4               <- 4 (一時停止)
=== RUN   Test_Func5
Test_Func5 entered
Test_Func5 returned                <- 5 (完了)
--- PASS: Test_Func5 (0.00s)
=== CONT  Test_Func2               <- 処理が再開
Test_Func2 returned                <- 完了
=== CONT  Test_Func4               <- 処理が再開
Test_Func4 returned                <- 完了
--- PASS: Test_Func2 (0.00s)
--- PASS: Test_Func4 (0.00s)
PASS

*[WIP below]

Pay special attention to how, in the results above, calling the t.Parallel() method makes the function pause and then resume. When a pause occurs, it is indicated with === PAUSE. When processing resumes, it is indicated with === CONT.

The condition for resuming processing for a test paused after calling the t.Parallel() method is described below as Operation 1.

Operation 1: Once all the top-level test functions (within a package) that do not call the t.Parallel() method have completed, processing of top-level test functions calling the t.Parallel() method is resumed and runs in parallel.

Operation 1 means that, if a top-level test function does not call the t.Parallel() method, the program will not move on to running to the next top-level test function until the execution of its subtest functions have completed—even if a subtest function using t.Run() calls the t.Parallel() method.

For example, let’s rewrite Test_Func1 as follows (code).

func Test_Func1(t *testing.T) {
    defer trace("Test_Func1")()

    t.Run("Func1_Sub1", func(t *testing.T) {
        defer trace("Func1_Sub1")()
        t.Parallel()

        // ...
    })

    t.Run("Func1_Sub2", func(t *testing.T) {
        defer trace("Func1_Sub2")()

        t.Parallel()
        // ...
    })

    // ...
}

We’ve added two subtest functions that both call the t.Parallel() method.

The results of running this are shown below.

=== RUN   Test_Func1
Test_Func1 entered
=== RUN   Test_Func1/Func1_Sub1
Func1_Sub1 entered                          <- Func1_Sub1 starts
=== PAUSE Test_Func1/Func1_Sub1             <- Func1_Sub1 pauses
=== RUN   Test_Func1/Func1_Sub2
Func1_Sub2 entered                          <- Func1_Sub2 starts
=== PAUSE Test_Func1/Func1_Sub2             <- Func1_Sub2 pauses
Test_Func1 returned                         <- Test_Func1 call returns(*)
=== CONT  Test_Func1/Func1_Sub1             <- Func1_Sub1 resumes
Func1_Sub1 returned                         <- Func1_Sub1 completes
=== CONT  Test_Func1/Func1_Sub2             <- Func1_Sub2 resumes
Func1_Sub2 returned                         <- Func1_Sub2 completes
--- PASS: Test_Func1 (0.00s)                <- Test_Func1 results displayed
    --- PASS: Test_Func1/Func1_Sub1 (0.00s)
    --- PASS: Test_Func1/Func1_Sub2 (0.00s)
=== RUN   Test_Func2                        <- Test_Func2 is not run until this point
Test_Func2 entered
=== PAUSE Test_Func2
=== RUN   Test_Func3
Test_Func3 entered
Test_Func3 returned
--- PASS: Test_Func3 (0.00s)
=== RUN   Test_Func4
Test_Func4 entered
=== PAUSE Test_Func4
=== RUN   Test_Func5
Test_Func5 entered
Test_Func5 returned
--- PASS: Test_Func5 (0.00s)
=== CONT  Test_Func2
Test_Func2 returned
=== CONT  Test_Func4
Test_Func4 returned
--- PASS: Test_Func4 (0.00s)
--- PASS: Test_Func2 (0.00s)
PASS

As shown in the results above, Test_Func1 does not call the t.Parallel() method, so the program does not process the subsequent Test_Func2 until all tests within are completed. In other words, if the top-level test function does not call the t.Parallel() method at all, the tests in the package will be run sequentially one-by-one by the top-level test function. Of course, if subtest functions using t.Run() within the top-level test function call the t.Parallel() method, the included subtest functions will be run in parallel.

There’s something else to note in the results.

Operation 2: If a subtest function using t.Run() calls the t.Parallel() method, the subtest function will pause once the t.Parallel() method is called, and remain paused until its parent top-level test function completes and returns. (This behavior is the same whether the parent top-level test function called the t.Parallel() method or not.)

In other words, we can express Operation 2 as follows.

Operation 2 (expressed differently): If a subtest function using t.Run() calls the t.Parallel() method and is paused by the t.Parallel() method, the subtest function will resume after the parent top-level test function completes and returns.

We can combine Operation 1 and Operation 2 to state that, “in order to improve parallelism as much as possible, the t.Parallel() method must be called by both the top-level test function and its subtest functions.” Doing so would mean that all subtest functions inside the package calling the t.Parallel() method would operate in parallel at once.

Parallel level

I just said that all test functions would operate in parallel at once, but in reality, the number of test functions that can operate simultaneously is limited. The number of test functions that will operate in parallel is specified using the -parallel flag.

    -parallel n
        Allow parallel execution of test functions that call t.Parallel.
        The value of this flag is the maximum number of tests to run
        simultaneously; by default, it is set to the value of GOMAXPROCS.
        Note that -parallel only applies within a single test binary.
        The 'go test' command may run tests for different packages
        in parallel as well, according to the setting of the -p flag
        (see 'go help build').

Allow parallel execution of test functions that call t.Parallel. The value of this flag is the maximum number of tests to run simultaneously. By default, it is set to the value of GOMAXPROCS. Note that -parallel only applies within a single test binary. The go test command may run tests for different packages in parallel as well, according to the setting of the -p flag.

If this is not explicitly specified, the number will be the value of the GOMAXPROCS environment variable. If the value of GOMAXPROCS is not explicitly set, it will be equal to the number of (apparent) CPUs.

If many of the tests, including subtests, will be accessing a database, the parallel level should be explicitly set to a number larger than the number of CPUs. Otherwise, the tests will often be waiting for communication. In contrast, specifying a large value will not improve performance if tests will be performing calculations requiring heavy CPU processing.

defer statement and t.Cleanup() method

Using the defer statement or t.Cleanup() method requires some caution when running post-processing once a test is complete. Basic considerations for top-level test functions are listed below.

  • If a top-level test function does not contain subtest functions using the t.Run() method, either the defer statement or the t.Cleanup() method may be used to write the post-process.
  • If a top-level test function contains subtest functions using the t.Run() method but all of these subtest functions do not call the t.Parallel() method, either the defer statement or the t.Cleanup() method may be used to write the post-process.
  • If a top-level test function contains subtest functions using the t.Run() method and at least one of these subtest functions calls the t.Parallel() method, use the t.Cleanup() method to write the post-process.

The defer statement is called when the function containing it returns. Review the previous example of executing code. The Test_Func1 function returns prior to subtest functions Func1_Sub1 and Func1_Sub2 completing (Operation 2). Therefore, any functions contained in the Test_Func1 function that use the defer statement to specify a delay will be called prior to resuming processing after the Func1_Sub1 and Func1_Sub2 subtest functions are paused (note the location where Test_Func1 returned is displayed in the execution results above).

For example, imagine a post-process where table records created by a subtest function are deleted. Even if this post-process is delayed by the top-level test function using a defer statement, once the subtest function calls the t.Parallel() method, the post-process function specified by the defer statement would be called prior to executing the subtest function. In this case, instead of using the defer statement, you should use the t.Cleanup() method to write the post-process.

The description of the t.Cleanup() method is as follows.

func (c *T) Cleanup(f func())
    Cleanup registers a function to be called when the test and all its subtests
    complete. Cleanup functions will be called in last added, first called
    order.

Cleanup registers a function to be called when the test and all its subtests complete. Cleanup functions will be called in last added, first called order.

The description indicates that the function registered to the t.Cleanup() method will be called once all subtests complete.

So, what will happen to a post-process within a subtest function written using the t.Run() method? It will look a lot like the three considerations mentioned above, just with the subjects of each statement changed. Let’s take a look.

  • If a subtest function does not contain any nested sub-subtest functions using the t.Run() method, either the defer statement or the t.Cleanup() method may be used to write the post-process.
  • If a subtest function contains nested sub-subtest functions using the t.Run() method but all of these sub-subtest functions do not call the t.Parallel() method, either the defer statement or the t.Cleanup() method may be used to write the post-process.
  • If a subtest function contains nested sub-subtest functions using the t.Run() method and at least one of these sub-subtest functions calls the t.Parallel() method, use the t.Cleanup() method to write the post-process.

If you’d rather not memorize these six considerations for top-level test functions and subtest functions, you could instead just decide to use t.Cleanup() to write post-processes for any test code using the t.Parallel() method, depending on the project.

Summary

You might think that parallel execution would be performed properly as long as the t.Parallel() method is called. However, you need to keep several points in mind, as discussed in this article.

I’ll summarize the key points below.

  • The -p flag is used to specify that tests from multiple packages should be run in parallel as separate processes. -p=1 would cause packages to be run one at a time.
  • Calling the t.Parallel() method will cause top-level test functions or subtest functions in a package to run in parallel.
  • A test function calling the t.Parallel() method (including the top level) will not resume processing once paused by the t.Parallel() method being called, until its parent test function call returns.
  • By default, the parallel level of the t.Parallel() method is the value of GOMAXPROCS. To explicitly change this, either specify the value using the -parallel flag, or set the GOMAXPROCS environment variable.
  • Determine whether to use the t.Cleanup method or the defer statement for post-processes within test functions, based on whether or not included subtest functions call the t.Parallel() method.
  • Even if the t.Parallel() method is used, tests from multiple packages will not be run within a single test process at the same time.

This article summarized some points I’ve noticed in my work maximizing parallelization for tests in Merpay (microservice) packages. I was able to significantly reduce test times for these packages (to 10% or lower). I didn’t cover any specific points with regard to parallelization programming in this article, but would like to if I get the chance to do so.