Is it possible to build a Go program without using go build
?
Indeed, it is!
This article explains how the official go build
works and how to reproduce it on your own.
One day this question came to my mind, and I decided to write my own go build
bash script. After 2 weeks, I reached the stage where I can build the kubectl
binary, a Kubernetes client program that depends on more than 800 packages.
You can check out the script here:
https://github.com/DQNEO/go-build-bash
It’s able to build kubectl
, uber-go/zap
, spf13/cobra
, golang/protobuf
and other renowned modules. Additionally, it supports some level of cross-compilation (4 patterns, limited to amd64
CPU)
- Mac → Mac
- Mac → Linux
- Linux → Mac
- Linux → Linux
I also succeeded in building my own Go compiler (https://github.com/DQNEO/babygo) and assembler (https://github.com/DQNEO/goas) using this go-build-bash. Seeing it function was incredibly thrilling.
Actually its build speed is slow (a full build of kubectl is 4 times slower than the official Go). However, as I aimed to keep the code as simple as possible while writing in bash, even people who are unfamiliar with Go can comprehend it. I also ensured that the build log is highly readable.
Here is the log of the hello world
build. It gives a clear view of what happens during the build process:
https://gist.github.com/DQNEO/7b0710b08baa4eb2fc6fb8bde8c432e1
By this experience I got a basic understanding of how the official go build
works, and am going to explain it in the following chapters.
(I try to make it as accurately as possible to the best of my understanding, However, it may not be completely accurate. If you see any inconsistencies, please message me at https://twitter.com/DQNEO )
What does the official go build
do?
The overall process of go build
can be broken down as follows:
- It inspects the source files of the specified package for
import
declarations, followed by a recursive examination of the source files for the packages that need to be imported. As a result, a dependency graph/tree is formed. - Packages are sorted by the number of packages it is depended by, from least to most depended.e.g.,
runtime
->reflect
->fmt
->main
) - It compiles the Go code of a package and places it into an archive file.
- If a package includes assembly files, these are also assembled and added to the archive file.
- Finally, all of the archive files for each package are linked together to create a binary executable file
On taking a closer look at this process, you will find some key points:
- The fundamental concept is ""Work on a per-package basis".
- During the compilation of a package, only the directly imported packages are referenced.
- Cross-compilation essentially involves the selection of source files that match target architecture.
- Multiple files within a package are simultaneously processed to the compiler (you can observe the compiler parsing multiple files at once. (https://cs.opensource.google/go/go/+/refs/tags/go1.20.5:src/cmd/compile/internal/noder/noder.go;l=43-60 )
These characteristics facilitate parallelization (between and within packages) and simplify cache management, thereby reducing build times.
Given that the development of the Go language was initially intended to reduce build times, it’s natural for such innovations to be incorporated within its syntax or language specification. (refer to the Go language announcement in 2009 https://youtu.be/rKnDgT73v8s?t=839 )
One instance of this is that the compiler will report an error if imported packages are not used, which helps to reduce the build speed. Another example is the requirement to include import
declarations immediately after the package declaration, which simplifies the task for the builder, as then there’s no need to parse the entire file to craft the dependency graph.
Interestingly, the unsafe
package doesn’t show up in the build log. One would expect it to appear on the build log — after all, it should be just another package. In reality, reflect does not appear in the build logs. because reflect is what is known as a "pseudo-package". It is actually part of the compiler features. https://cs.opensource.google/go/go/+/refs/tags/go1.20.5:src/cmd/compile/internal/gc/main.go;l=90-91 )
By following the building operations below, you can see these facts by yourself.
Building Hello world
and Tracking the process
Let’s actually use the official go build
to monitor the process.
First, create the necessary files. ( main.go
and go.mod
)
$ cat > main.go <<EOF
package main
import "fmt"
func main() {fmt.Println("hello world")}
EOF
$ go mod init example.com/hello
Ensure that it can be built and run.
$ go build
$ ./hello
hello world
Output execution log
You can view the logs by adding the -x
option to go build
:
$ go build -x
WORK=/var/folders/bq/2mhmkrcn59dd9t7pq5_6hbw80000gp/T/go-build2336838040
Unfortunately, there is only one line in the log.
This is because the cache is in effect. Since this is the second build of hello
, it utilizes the result of the first build.
The -a
option disables all caching, and all packages, including standard libraries, are built from source:
$ go build -x -a
WORK=/var/folders/bq/2mhmkrcn59dd9t7pq5_6hbw80000gp/T/go-build4274470276
mkdir -p $WORK/b005/
mkdir -p $WORK/b012/
cat >/var/folders/bq/2mhmkrcn59dd9t7pq5_6hbw80000gp/T/go-build4274470276/b005/importcfg << 'EOF' # internal
# import config
EOF
cat >/var/folders/bq/2mhmkrcn59dd9t7pq5_6hbw80000gp/T/go-build4274470276/b012/importcfg << 'EOF' # internal
# import config
EOF
cd /tmp/birudo
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/compile -o $WORK/b005/_pkg_.a -trimpath "$WORK/b005=>" -p internal/goarch -std -+ -complete -buildid NeMeTvvWBf8p5uHSGfak/NeMeTvvWBf8p5uHSGfak -goversion go1.20.4 -c=4 -nolocalimports -importcfg $WORK/b005/importcfg -pack /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch_amd64.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/zgoarch_amd64.go
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/compile -o $WORK/b012/_pkg_.a -trimpath "$WORK/b012=>" -p internal/coverage/rtcov -std -+ -complete -buildid mI6xNmP8pxnOcrWlN_qn/mI6xNmP8pxnOcrWlN_qn -goversion go1.20.4 -c=4 -nolocalimports -importcfg $WORK/b012/importcfg -pack /usr/local/Cellar/go/1.20.4/libexec/src/internal/coverage/rtcov/rtcov.go
mkdir -p $WORK/b014/
…
When executed, it outputs a long log. It is messy and somewhat unreadable. This is because multiple package builds are running in parallel.
The -p 1
option restricts the number of parallel processes to 1.
$ go build -x -a -p 1
WORK=/var/folders/bq/2mhmkrcn59dd9t7pq5_6hbw80000gp/T/go-build3299870493
mkdir -p $WORK/b005/
cat >/var/folders/bq/2mhmkrcn59dd9t7pq5_6hbw80000gp/T/go-build3299870493/b005/importcfg << 'EOF' # internal
# import config
EOF
cd /tmp/birudo
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/compile -o $WORK/b005/_pkg_.a -trimpath "$WORK/b005=>" -p internal/goarch -std -+ -complete -buildid NeMeTvvWBf8p5uHSGfak/NeMeTvvWBf8p5uHSGfak -goversion go1.20.4 -c=8 -nolocalimports -importcfg $WORK/b005/importcfg -pack /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch_amd64.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/zgoarch_amd64.go
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/buildid -w $WORK/b005/_pkg_.a # internal
cp $WORK/b005/_pkg_.a /Users/DQNEO/Library/Caches/go-build/79/799f3b0680ae6929fbd8bc4eea9aa74868623c9e216293baf43e5e1a3c85aa84-d # internal
mkdir -p $WORK/b006/
cat >/var/folders/bq/2mhmkrcn59dd9t7pq5_6hbw80000gp/T/go-build3299870493/b006/importcfg << 'EOF' # internal
# import config
The build flow now appears as a single stream and is much easier to follow.
Interestingly, the log is an executable shell script. Let’s save the log to a file and run it as a bash script.
$ go build -x -a -p 1 2> buildx.sh
$ bash < buildx.sh
$ ./hello
hello world
It runs perfectly.
Here’s an additional trick: If you pass the -n
option instead of -x
, the build will not execute and it only generates the log, which is super fast (known as a dry-run). The log will also come with comments, making it easier to read. This is helpful when you want to investigate the build process.
(Note that -n
automatically applies -p 1
, so -p
is not necessary in this case.)
$ go build -n -a
#
# internal/goarch
#
mkdir -p $WORK/b005/
cat >$WORK/b005/importcfg << 'EOF' # internal
# import config
EOF
cd /tmp/birudo
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/compile -o $WORK/b005/_pkg_.a -trimpath "$WORK/b005=>" -p internal/goarch -std -+ -complete -buildid NeMeTvvWBf8p5uHSGfak/NeMeTvvWBf8p5uHSGfak -goversion go1.20.4 -c=8 -nolocalimports -importcfg $WORK/b005/importcfg -pack /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch_amd64.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/zgoarch_amd64.go
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/buildid -w $WORK/b005/_pkg_.a # internal
…
However, there is one caveat: the -n
logs are not executable in a shell as they appear. Some modifications are required to make it executable, namely:
- Set the variable
$WORK
. - Remove
'EOF'
quotes.
$ go build -n -a 2> buildn.sh
$ cat buildn.sh | sed -e "s/'EOF'.*$/EOF/g" | WORK=/tmp/go-build bash
Now it’s executable.
I recommend refactoring this buildn.sh
script (e.g., combining iterations into for
statements) for better understanding. Actually, my go-build-bash
, introduced at the beginning of this article, is the ultimate result of such refactoring.
Inference of hidden logic from execution logs
Unfortunately, it is not possible to understand the build only by reading the log. There is some hidden logic that does not show up in the log.
- Where to find the source code for the package
- How to select files to compile
- How to determine compilation options
- How to determine the order of packages to build
- How to embed files when embed tags are present
For example, we can see the internal/goarch
package being compiled at the start of the hello build log:
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/compile -o $WORK/b005/_pkg_.a -trimpath "$WORK/b005=>" -p internal/goarch -std -+ -complete -buildid NeMeTvvWBf8p5uHSGfak/NeMeTvvWBf8p5uHSGfak -goversion go1.20.4 -c=8 -nolocalimports -importcfg $WORK/b005/importcfg -pack /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/goarch_amd64.go /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch/zgoarch_amd64.go
How does the build know internal/goarch
should be compiled first?
How does it know the source files are in /usr/local/Cellar/go/1.20.4/libexec/src
?
Regarding the list of files sent to compile
, only three files goarch.go
goarch_amd64.go
zgoarch_amd64.go
are visible in the log. However, a look at the source directory in internal/goarch
reveals 39 .go
files:
$ ls /usr/local/Cellar/go/1.20.4/libexec/src/internal/goarch
gengoarch.go goarch_arm.go goarch_mips64.go goarch_ppc64le.go zgoarch_386.go zgoarch_arm64be.go zgoarch_mips64.go zgoarch_mipsle.go zgoarch_riscv.go zgoarch_sparc.go
goarch.go goarch_arm64.go goarch_mips64le.go goarch_riscv64.go zgoarch_amd64.go zgoarch_armbe.go zgoarch_mips64le.go zgoarch_ppc.go zgoarch_riscv64.go zgoarch_sparc64.go
goarch_386.go goarch_loong64.go goarch_mipsle.go goarch_s390x.go zgoarch_arm.go zgoarch_loong64.go zgoarch_mips64p32.go zgoarch_ppc64.go zgoarch_s390.go zgoarch_wasm.go
goarch_amd64.go goarch_mips.go goarch_ppc64.go goarch_wasm.go zgoarch_arm64.go zgoarch_mips.go zgoarch_mips64p32le.go zgoarch_ppc64le.go zgoarch_s390x.go
What is the logic behind selecting 3 out of 39?
Some packages have the compile option -complete
or -+
, while others don’t. What is the criteria for this?
IIf a package has assembly files, the process changes significantly. if you build a larger package , like kubectl
, you’ll notice special handling for embed
. There are many hidden mechanics like this.
If you intend to create your own builder, you’ll need to reproduce these processes.
As a reverse engineering enthusiast, I guessed what the process is by looking at the logs to create my own go build
.
Reproduce the build process details
Finding the source directory of the package
Generally,
- Standard libraries from
$(go env GOROOT)/src
- Packages in your own module are from your module’s root directory (where
go.mod
is located) - Otherwise, from the
vendor
directory
Determining the order of packages to build?
The dependency graph for the build is obtained by following the import declarations in the source code recursively. We can use an algorithm called Topological sort to establish the build order of the packages.
A very rough description of this procedure can be described as:
- Cut off terminal nodes(the "leaf" elements) of the tree
- Then some of the remaining branches become new terminal nodes
- Cut them off
- Repeat this process until the tree is empty
In my build tool, you can view the state before and after the sort:
(https://gist.github.com/DQNEO/7b0710b08baa4eb2fc6fb8bde8c432e1#file-build_hello-log-L681-L769 )
Selecting files to compile
The logic for selecting files to be compiled from the package source directory is as follows:
- Exclude the
*_test.go
files - For files with
_{OS}. *
,_{CPU}. *
,_{OS}_{CPU}. *
suffixes, exclude those that do not match the build target ($GOOS, $GOARCH) - For the remaining files, parse the build tags (e.g.
//go:build windows || (linux && amd64)
) and exclude those that do not match the result of logical operations
The remaining files that are not excluded are passed to the compiler.
For example, when it builds math
package for a machine with Intel CPU, "exp_amd64.go" is selected due to the filename suffix rule, and "exp_asm.go" is selected due to its built tag ("amd64 || arm64 || s390x") to generate machine-specific binary code
It is wonderful that such a simple mechanism is able to achieve cross-compilation.
Luckily for me, logical operators build tags (!
, &&
,||
, etc.) can be interpreted as is in bash, so porting was easy.
Determining compilation options
Some package attributes lead to different compile options.
-std
compiling standard library-complete
compiling complete package (no C or assembly)-symabis
read symbol ABIs from file-embedcfg
read go:embed configuration from file
-std
must be added when compiling standard library packages.
-complete
can be added when you want to reject function declarations without body. The go build
style is to add it usually, and remove it only for special cases (assembly files and a few packages with functions without body). Note that the language specification allows function declarations without body.
-symabis
must be added when the package contains assembly files (see below).
-embedcfg
is a configuration file that realizes go:embed
(see below).
Handling assembly files
If the package directory contains assembly files, following operations are needed:
Create symabis
file
It is used to tell the compiler which assembly function conforms to which ABI (Application Binary Interface).
You do not need to be aware of the contents of the file, as they are automatically generated by asm -gensymabis
.
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/asm -p internal/cpu -trimpath "$WORK/b011=>" -I $WORK/b011/ -I /usr/local/Cellar/go/1.20.4/libexec/pkg/include -D GOOS_darwin -D GOARCH_amd64 -D GOAMD64_v1 -gensymabis -o $WORK/b011/symabis ./cpu.s ./cpu_x86.s
Assemble
This is the assembly process in a narrow sense. It converts the assembly source to an object file. There is a one-to-one correspondence between input and output files.
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/asm -p internal/cpu -trimpath "$WORK/b011=>" -I $WORK/b011/ -I /usr/local/Cellar/go/1.20.4/libexec/pkg/include -D GOOS_darwin -D GOARCH_amd64 -D GOAMD64_v1 -o $WORK/b011/cpu.o ./cpu.s
Add object file to archive
You can use the pack r
command to append object files to the archive.
/usr/local/Cellar/go/1.20.4/libexec/pkg/tool/darwin_amd64/pack r $WORK/b012/_pkg_.a $WORK/b012/cpu.o $WORK/b012/cpu_x86.o # internal
If you are curious about the contents of the archive file (pkg.a), you can see a list of object files by pack t
.
$ go tool pack t _pkg_.a
__.PKGDEF
_go_.o
cpu.o
cpu_x86.o
Embedding files when embed tags are present
If go:embed
tag is present in the source code, the filesystem must be explored to make a mapping information into JSON, which is passed to the compiler. go:embed
actually has multiple modes of operations, including embedding a single file, a directory, and globbing by matching file names.
I will not go into detail as it would be long, so let me introduce how the single file mode works.
//go:embed p256_asm_table.bin
var p256PrecomputedEmbed string
The absolute path of the specified file is resolved and written in JSON.
{
"Patterns": {
"p256_asm_table.bin": [
"p256_asm_table.bin"
]
}, }
"Files": {
"p256_asm_table.bin": "/usr/local/Cellar/go/1.20.4/libexec/src/crypto/internal/nistec/p256_asm_table.bin"
}
}
If you are curious about other modes, please take a look at my bash implementation.
Save this JSON in a file, pass it to the compiler with -embedcfg
option, and it incorporates the JSON into the object file.
compile -embedcfg $WORK/b050/embedcfg ...
This is how go:embed
works at the builder’s layer. Actual work of embedding files is done by the compiler.
After applying all these logic to find the source directory, select files, determine compiling options, sorting packages and embedding files, you can finally get a binary that works.
Conclusion
Now you can build large programs such as kubectl
.
The details that were not mentioned in this article can be found in the build log and go-build-bash
code. You can also read the official go build
source. (https://github.com/golang/go/blob/e827d41c0a2ea392c117a790cdfed0022e419424/src/cmd/go/internal/work/build.go#L447 )
You can build your program by yourself !
(This article is translated from my Japanese version: https://zenn.dev/dqneo/articles/ce9459676a3303 )