How to Build a Go Program without Using go build

Is it possible to build a Go program without using go build?
Indeed, it is!

This article explains how the official go build works and how to reproduce it on your own.

One day this question came to my mind, and I decided to write my own go build bash script. After 2 weeks, I reached the stage where I can build the kubectl binary, a Kubernetes client program that depends on more than 800 packages.

You can check out the script here:
https://github.com/DQNEO/go-build-bash

It’s able to build kubectl , uber-go/zap, spf13/cobra, golang/protobuf and other renowned modules. Additionally, it supports some level of cross-compilation (4 patterns, limited to amd64 CPU)

  • Mac → Mac
  • Mac → Linux
  • Linux → Mac
  • Linux → Linux

I also succeeded in building my own Go compiler (https://github.com/DQNEO/babygo) and assembler (https://github.com/DQNEO/goas) using this go-build-bash. Seeing it function was incredibly thrilling.

Actually its build speed is slow (a full build of kubectl is 4 times slower than the official Go). However, as I aimed to keep the code as simple as possible while writing in bash, even people who are unfamiliar with Go can comprehend it. I also ensured that the build log is highly readable.

Here is the log of the hello world build. It gives a clear view of what happens during the build process:
https://gist.github.com/DQNEO/7b0710b08baa4eb2fc6fb8bde8c432e1

By this experience I got a basic understanding of how the official go build works, and am going to explain it in the following chapters.
(I try to make it as accurately as possible to the best of my understanding, However, it may not be completely accurate. If you see any inconsistencies, please message me at https://twitter.com/DQNEO )

What does the official go build do?

The overall process of go build can be broken down as follows:

  • It inspects the source files of the specified package for import declarations, followed by a recursive examination of the source files for the packages that need to be imported. As a result, a dependency graph/tree is formed.
  • Packages are sorted by the number of packages it is depended by, from least to most depended.e.g., runtime -> reflect -> fmt -> main )
  • It compiles the Go code of a package and places it into an archive file.
  • If a package includes assembly files, these are also assembled and added to the archive file.
  • Finally, all of the archive files for each package are linked together to create a binary executable file

On taking a closer look at this process, you will find some key points:

These characteristics facilitate parallelization (between and within packages) and simplify cache management, thereby reducing build times.
Given that the development of the Go language was initially intended to reduce build times, it’s natural for such innovations to be incorporated within its syntax or language specification. (refer to the Go language announcement in 2009 https://youtu.be/rKnDgT73v8s?t=839 )
One instance of this is that the compiler will report an error if imported packages are not used, which helps to reduce the build speed. Another example is the requirement to include import declarations immediately after the package declaration, which simplifies the task for the builder, as then there’s no need to parse the entire file to craft the dependency graph.

Interestingly, the unsafe package doesn’t show up in the build log. One would expect it to appear on the build log — after all, it should be just another package. In reality, reflect does not appear in the build logs. because reflect is what is known as a "pseudo-package". It is actually part of the compiler features. https://cs.opensource.google/go/go/+/refs/tags/go1.20.5:src/cmd/compile/internal/gc/main.go;l=90-91 )

By following the building operations below, you can see these facts by yourself.

Building Hello world and Tracking the process

Let’s actually use the official go build to monitor the process.

First, create the necessary files. ( main.go and go.mod )

$ cat > main.go <<EOF
package main
import "fmt"
func main() {fmt.Println("hello world")}
EOF
$ go mod init example.com/hello

Ensure that it can be built and run.

$ go build
$ ./hello
hello world

Output execution log

You can view the logs by adding the -x option to go build:

$ go build -x
WORK=/var/folders/bq/2mhmkrcn59dd9t7pq5_6hbw80000gp/T/go-build2336838040

Unfortunately, there is only one line in the log.
This is because the cache is in effect. Since this is the second build of hello, it utilizes the result of the first build.

The -a option disables all caching, and all packages, including standard libraries, are built from source:

$ go build -x -a
WORK=/var/folders/bq/2mhmkrcn59dd9t7pq5_6hbw80000gp/T/go-build4274470276
mkdir -p $WORK/b005/
mkdir -p $WORK/b012/
cat >/var/folders/bq/2mhmkrcn59dd9t7pq5_6hbw80000gp/T/go-build4274470276/b005/importcfg << 'EOF' # internal
# import config
EOF
cat >/var/folders/bq/2mhmkrcn59dd9t7pq5_6hbw80000gp/T/go-build4274470276/b012/importcfg << 'EOF' # internal
# import config
EOF
cd /tmp/birudo
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/compile -o $WORK/b005/_pkg_.a -trimpath "$WORK/b005=>" -p internal/goarch -std -+ -complete -buildid NeMeTvvWBf8p5uHSGfak/NeMeTvvWBf8p5uHSGfak -goversion go1.20.4 -c=4 -nolocalimports -importcfg $WORK/b005/importcfg -pack /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch_amd64.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/zgoarch_amd64.go
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/compile -o $WORK/b012/_pkg_.a -trimpath "$WORK/b012=>" -p internal/coverage/rtcov -std -+ -complete -buildid mI6xNmP8pxnOcrWlN_qn/mI6xNmP8pxnOcrWlN_qn -goversion go1.20.4 -c=4 -nolocalimports -importcfg $WORK/b012/importcfg -pack /usr/local/Cellar/go/1.20.4/libexec/src/internal/coverage/rtcov/rtcov.go
mkdir -p $WORK/b014/

…

When executed, it outputs a long log. It is messy and somewhat unreadable. This is because multiple package builds are running in parallel.

The -p 1 option restricts the number of parallel processes to 1.

$ go build -x -a -p 1
WORK=/var/folders/bq/2mhmkrcn59dd9t7pq5_6hbw80000gp/T/go-build3299870493
mkdir -p $WORK/b005/
cat >/var/folders/bq/2mhmkrcn59dd9t7pq5_6hbw80000gp/T/go-build3299870493/b005/importcfg << 'EOF' # internal
# import config
EOF
cd /tmp/birudo
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/compile -o $WORK/b005/_pkg_.a -trimpath "$WORK/b005=>" -p internal/goarch -std -+ -complete -buildid NeMeTvvWBf8p5uHSGfak/NeMeTvvWBf8p5uHSGfak -goversion go1.20.4 -c=8 -nolocalimports -importcfg $WORK/b005/importcfg -pack /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch_amd64.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/zgoarch_amd64.go
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/buildid -w $WORK/b005/_pkg_.a # internal
cp $WORK/b005/_pkg_.a /Users/DQNEO/Library/Caches/go-build/79/799f3b0680ae6929fbd8bc4eea9aa74868623c9e216293baf43e5e1a3c85aa84-d # internal
mkdir -p $WORK/b006/
cat >/var/folders/bq/2mhmkrcn59dd9t7pq5_6hbw80000gp/T/go-build3299870493/b006/importcfg << 'EOF' # internal
# import config

The build flow now appears as a single stream and is much easier to follow.
Interestingly, the log is an executable shell script. Let’s save the log to a file and run it as a bash script.

$ go build -x -a -p 1 2> buildx.sh
$ bash < buildx.sh
$ ./hello
hello world

It runs perfectly.
Here’s an additional trick: If you pass the -n option instead of -x, the build will not execute and it only generates the log, which is super fast (known as a dry-run). The log will also come with comments, making it easier to read. This is helpful when you want to investigate the build process.
(Note that -n automatically applies -p 1, so -p is not necessary in this case.)

$ go build -n -a

#
# internal/goarch
#

mkdir -p $WORK/b005/
cat >$WORK/b005/importcfg << 'EOF' # internal
# import config
EOF
cd /tmp/birudo
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/compile -o $WORK/b005/_pkg_.a -trimpath "$WORK/b005=>" -p internal/goarch -std -+ -complete -buildid NeMeTvvWBf8p5uHSGfak/NeMeTvvWBf8p5uHSGfak -goversion go1.20.4 -c=8 -nolocalimports -importcfg $WORK/b005/importcfg -pack /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch_amd64.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/zgoarch_amd64.go
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/buildid -w $WORK/b005/_pkg_.a # internal

…

Here is the full log

However, there is one caveat: the -n logs are not executable in a shell as they appear. Some modifications are required to make it executable, namely:

  • Set the variable $WORK.
  • Remove 'EOF' quotes.
$ go build -n -a 2> buildn.sh
$ cat buildn.sh | sed -e "s/'EOF'.*$/EOF/g" | WORK=/tmp/go-build bash

Now it’s executable.

I recommend refactoring this buildn.sh script (e.g., combining iterations into for statements) for better understanding. Actually, my go-build-bash, introduced at the beginning of this article, is the ultimate result of such refactoring.

Inference of hidden logic from execution logs

Unfortunately, it is not possible to understand the build only by reading the log. There is some hidden logic that does not show up in the log.

  • Where to find the source code for the package
  • How to select files to compile
  • How to determine compilation options
  • How to determine the order of packages to build
  • How to embed files when embed tags are present

For example, we can see the internal/goarch package being compiled at the start of the hello build log:

/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/compile -o $WORK/b005/_pkg_.a -trimpath "$WORK/b005=>" -p internal/goarch -std -+ -complete -buildid NeMeTvvWBf8p5uHSGfak/NeMeTvvWBf8p5uHSGfak -goversion go1.20.4 -c=8 -nolocalimports -importcfg $WORK/b005/importcfg -pack /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch_amd64.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/zgoarch_amd64.go

How does the build know internal/goarch should be compiled first?
How does it know the source files are in /usr/local/Cellar/go/1.20.4/libexec/src?

Regarding the list of files sent to compile, only three files goarch.go goarch_amd64.go zgoarch_amd64.go are visible in the log. However, a look at the source directory in internal/goarch reveals 39 .go files:

$ ls /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch
gengoarch.go     goarch_arm.go      goarch_mips64.go    goarch_ppc64le.go  zgoarch_386.go    zgoarch_arm64be.go  zgoarch_mips64.go       zgoarch_mipsle.go   zgoarch_riscv.go    zgoarch_sparc.go
goarch.go        goarch_arm64.go    goarch_mips64le.go  goarch_riscv64.go  zgoarch_amd64.go  zgoarch_armbe.go    zgoarch_mips64le.go     zgoarch_ppc.go      zgoarch_riscv64.go  zgoarch_sparc64.go
goarch_386.go    goarch_loong64.go  goarch_mipsle.go    goarch_s390x.go    zgoarch_arm.go    zgoarch_loong64.go  zgoarch_mips64p32.go    zgoarch_ppc64.go    zgoarch_s390.go     zgoarch_wasm.go
goarch_amd64.go  goarch_mips.go     goarch_ppc64.go     goarch_wasm.go     zgoarch_arm64.go  zgoarch_mips.go     zgoarch_mips64p32le.go  zgoarch_ppc64le.go  zgoarch_s390x.go

What is the logic behind selecting 3 out of 39?

Some packages have the compile option -complete or -+, while others don’t. What is the criteria for this?

IIf a package has assembly files, the process changes significantly. if you build a larger package , like kubectl, you’ll notice special handling for embed. There are many hidden mechanics like this.

If you intend to create your own builder, you’ll need to reproduce these processes.
As a reverse engineering enthusiast, I guessed what the process is by looking at the logs to create my own go build.

Reproduce the build process details

Finding the source directory of the package

Generally,

  • Standard libraries from $(go env GOROOT)/src
  • Packages in your own module are from your module’s root directory (where go.mod is located)
  • Otherwise, from the vendor directory

Determining the order of packages to build?

The dependency graph for the build is obtained by following the import declarations in the source code recursively. We can use an algorithm called Topological sort to establish the build order of the packages.

A very rough description of this procedure can be described as:

  • Cut off terminal nodes(the "leaf" elements) of the tree
  • Then some of the remaining branches become new terminal nodes
  • Cut them off
  • Repeat this process until the tree is empty

In my build tool, you can view the state before and after the sort:
(https://gist.github.com/DQNEO/7b0710b08baa4eb2fc6fb8bde8c432e1#file-build_hello-log-L681-L769 )

Selecting files to compile

The logic for selecting files to be compiled from the package source directory is as follows:

  • Exclude the *_test.go files
  • For files with _{OS}. * , _{CPU}. * , _{OS}_{CPU}. * suffixes, exclude those that do not match the build target ($GOOS, $GOARCH)
  • For the remaining files, parse the build tags (e.g. //go:build windows || (linux && amd64)) and exclude those that do not match the result of logical operations

The remaining files that are not excluded are passed to the compiler.

For example, when it builds math package for a machine with Intel CPU, "exp_amd64.go" is selected due to the filename suffix rule, and "exp_asm.go" is selected due to its built tag ("amd64 || arm64 || s390x") to generate machine-specific binary code

It is wonderful that such a simple mechanism is able to achieve cross-compilation.

Luckily for me, logical operators build tags (! , &&,||, etc.) can be interpreted as is in bash, so porting was easy.

Determining compilation options

Some package attributes lead to different compile options.

  • -std compiling standard library
  • -complete compiling complete package (no C or assembly)
  • -symabis read symbol ABIs from file
  • -embedcfg read go:embed configuration from file

-std must be added when compiling standard library packages.
-complete can be added when you want to reject function declarations without body. The go build style is to add it usually, and remove it only for special cases (assembly files and a few packages with functions without body). Note that the language specification allows function declarations without body.
-symabis must be added when the package contains assembly files (see below).
-embedcfg is a configuration file that realizes go:embed (see below).

Handling assembly files

If the package directory contains assembly files, following operations are needed:

Create symabis file

It is used to tell the compiler which assembly function conforms to which ABI (Application Binary Interface).

You do not need to be aware of the contents of the file, as they are automatically generated by asm -gensymabis.

/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/asm -p internal/cpu -trimpath "$WORK/b011=>" -I $WORK/b011/ -I /usr/local/Cellar/go/1.20.4/libexec/pkg/include -D GOOS_darwin -D GOARCH_amd64 -D GOAMD64_v1 -gensymabis -o $WORK/b011/symabis ./cpu.s ./cpu_x86.s

Assemble

This is the assembly process in a narrow sense. It converts the assembly source to an object file. There is a one-to-one correspondence between input and output files.

/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/asm -p internal/cpu -trimpath "$WORK/b011=>" -I $WORK/b011/ -I /usr/local/Cellar/go/1.20.4/libexec/pkg/include -D GOOS_darwin -D GOARCH_amd64 -D GOAMD64_v1 -o $WORK/b011/cpu.o ./cpu.s

Add object file to archive

You can use the pack r command to append object files to the archive.

/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/pack r $WORK/b012/_pkg_.a $WORK/b012/cpu.o $WORK/b012/cpu_x86.o # internal

If you are curious about the contents of the archive file (pkg.a), you can see a list of object files by pack t.

$ go tool pack t _pkg_.a
__.PKGDEF
_go_.o
cpu.o
cpu_x86.o

Embedding files when embed tags are present

If go:embed tag is present in the source code, the filesystem must be explored to make a mapping information into JSON, which is passed to the compiler. go:embed actually has multiple modes of operations, including embedding a single file, a directory, and globbing by matching file names.
I will not go into detail as it would be long, so let me introduce how the single file mode works.

//go:embed p256_asm_table.bin
var p256PrecomputedEmbed string

The absolute path of the specified file is resolved and written in JSON.

{
    "Patterns": {
        "p256_asm_table.bin": [
            "p256_asm_table.bin"
        ]
    }, }
    "Files": {
        "p256_asm_table.bin": "/usr/local/Cellar/go/1.20.4/libexec/src/crypto/internal/nistec/p256_asm_table.bin"
    }
}

If you are curious about other modes, please take a look at my bash implementation.

Save this JSON in a file, pass it to the compiler with -embedcfg option, and it incorporates the JSON into the object file.

compile -embedcfg $WORK/b050/embedcfg ...

This is how go:embed works at the builder’s layer. Actual work of embedding files is done by the compiler.

After applying all these logic to find the source directory, select files, determine compiling options, sorting packages and embedding files, you can finally get a binary that works.

Conclusion

Now you can build large programs such as kubectl.
The details that were not mentioned in this article can be found in the build log and go-build-bash code. You can also read the official go build source. (https://github.com/golang/go/blob/e827d41c0a2ea392c117a790cdfed0022e419424/src/cmd/go/internal/work/build.go#L447 )

You can build your program by yourself !

(This article is translated from my Japanese version: https://zenn.dev/dqneo/articles/ce9459676a3303 )

  • X
  • Facebook
  • linkedin
  • このエントリーをはてなブックマークに追加