This article is a translation of the Japanese article published on December 19th, 2022.
My name is Ryo Aoyama. I have been working as the iOS lead architect for Mercari.
In this article, I will describe, along with Thi Doãn who works on our build systems, how we introduced Bazel for our software builds during the rewrite for our iOS app to improve our builds.
Within the Mercari app, we have developed countless features and functionalities within the app to provide a better listing and buying experience for our customers over many years. This has been the case even after the complete rewrite of our app which was released in September 2022: more than 100 pull requests are being merged every week, and the source code keeps growing.
As of this writing, the size of our source code looks like the following table. Although we were able to reduce the code size significantly, you can see that we still have a relatively large code base.
External source code size within Mercari app (measured excerpt via cloc)
The build time has obviously been increasing along with the source code size. The source code size was a big factor in slowing down the entire development process, as it meant that each successive build was taking longer to build.
There was also the problem of build reliability. I’m sure you have also faced situations where certain problems were only seen on CI but not on local builds, or perhaps the build results did not quite match with other developers’ results.
Building software is one of the most important aspects in engineering. A typical developer builds software anywhere from tens to hundreds of times a day. As the number of developers increases in a team, so does the cost (and importance) of the building process. It was very important for our growing team of developers to adopt an advanced build system.
While Xcode would have sufficed in most cases, it was presenting us with the following problems:
- There was not much we could do to speed up the build process
- There was not much that we could do force idempotent builds
- The build time was governed directly by the performance of the machine used
- There was not much support for reusing modular components, which in our case were divided into 500~1000 separate parts
- Modular components introduced much overhead when dealing with dynamic linking, and it was also hard to use static linking
To tackle these problems and create a fast and reliable build system, we have decided to adopt Bazel.
Bazel is an open source build tool developed by Google. It is most commonly used for applications based on Go or Java (among others), but it can also be used to build mobile applications for iOS and Android. The following are some of the merits of using Bazel over other tools such as Xcode:
Multi-language support: Bazel is built with the assumption that it is to be used on a monorepo that is implemented using multiple programming languages, and thus does not limit itself to a particular language features
Extensibility: Bazel can be configured and extended to work with languages that are not officially supported by the tool itself.
Reproducibility: Artifacts do not contain information about the build environment, avoiding the possibility of a build on CI being different from a build on a local machine
Configuration language: Bazel uses Starlark, a scripting language similar to Python. It is more than a DSL and can use variables and macros to implement advanced techniques for better reusability.
Advanced Caching: Extraneous builds can be completely skipped by creating a dependency graph from the build configurations. Remote caching and distributed building can be utilized to share artifacts between multiple machines
Task Automation: Bazel can include more than just building into its process; for example, dynamic code generation.
The Build Cache
Caching build artifacts and test results is one of the most effective ways to increase productivity. It is common to use tools like Carthage to build and cache external dependencies, which are not updated as frequently, and improve subsequent build times.
It is also possible to share these pre-built dependencies among developers by storing them in a shared cache storage.
Most of the build time, however, is taken up by the compilation of our own source code. It is hard to take advantage of these tools to pre-build and cache our own code, as they are constantly being modified.
In order to achieve significant performance improvements, we would need advanced features such as analyzing the dependency graph between build targets, deciding if which targets need to be rebuilt, checking if they can be run in parallel against each other, compute the critical path for a given build, etc. Bazel implements these features that are required in incremental builds.
Bazel is often described as being a fast building tool, but at least for the specific case of iOS application development, this may not necessarily be true. Bazel’s rules_apple and rules_swift actually work almost identical to what Xcode does in terms of compilation.
In fact, Xcode may be as fast, or even faster in some cases, than Bazel when compared against local builds on your machine, as Xcode is a very well written tool for incremental builds. There is some overhead of Bazel using static linking too.
The differentiating factor is in Bazel’s flexibility and abundance of choices to improve performance, as compared to the relatively limited choices available to Xcode. For example, because Bazel is created on the assumption that components will be modularized into small pieces, the merits of improvements and optimizations provided by caching are greater – as long as the modularization is done correctly.
One extra advantage of Bazel is the fact that it can utilize remote caches. A remote cache provides the ability to host build artifacts and test results on a shared storage server and share them among developers and CI environments. For example, if the CI performed the build beforehand and this result is shared through a remote cache, all developers sharing the cache will be able to avoid rebuilding the same components. If a developer makes changes in their local environment, Bazel will analyze the targets that are affected and only build the minimum number of targets that are required. This in turn means that most of the time you will never have to perform a clean build from scratch on your local environment.
This works similarly for tests: Bazel caches test results for each module and shows the cached results, only to execute them again when needed. Xcode would sometimes take over 1 hour to execute tests when there are many unit tests, so this brought significant improvement in lead time.
The image below shows a sample build log from our CI environment. You can see that only 2 out of 217 tests total were actually executed for this run. All other test results were not affected by the pull request that triggered this run, and thus they were fetched from the remote cache. Those tests marked as
(cached) were not executed. As can be seen in this sample, test execution time was drastically reduced as we did not have to wait for the tests whose results we already knew.
The remote cache unfortunately does not come for free, and we need to manage our own remote cache backend. You can choose an arbitrary server that you will be setting up and maintaining, or you can use an instance of Google Cloud Storage (GCS), which is already compatible with Bazel’s remote cache. It was natural for us in Mercari, who extensively uses Google Cloud Platform, to initially select GCS as the remote cache. However, although GCS performed well, it did not support garbage collection features required in Bazel’s
Remote Builds without the Bytes feature.
Later on, we migrated the remote cache backend to BuildBuddy, which supported the
Remote Builds without the Bytes feature.
Either way, maintenance costs, as far as the number of hands required to maintain it, was nearly non-existent.
Distributed Builds and Tests
A Bazel feature similar to remote cache is the Remote Build Execution (RBE). RBE provides the ability to execute builds and tests on separate machines in a distributed manner. We can share and reuse the final results from executions on different machines, as RBE’s outputs are idempotent as with the remote cache.
We can significantly reduce the time it takes to build and test our software using distributed builds. For example, it may be possible to get better performance by executing CI on a separate machine when the main CI host does not provide enough horsepower.
We use RBE via BuildBuddy to build and test on a few hundred machines in Apple silicon M1 build farms. In our build configuration, all builds and tests can be executed on these farms by simply using the following command:
bazel test --config=RBE //…
The execution speed of our builds are only limited by the number of available CPU cores, because of our aggressive modularization that we performed when rewriting our application.
One problem that we faced when starting to use RBE was that while our CI servers were provisioned using Intel based Macs, the build farm consisted solely of Apple silicon Macs. Further complicating the matter, the developers themselves were working on a mix of Intel and M1 Macs because this was around the time when Mercari started providing M1 Macs to its employees. This required us to monitor for possible decrease in cache hit rates while supporting development. For details, please refer to the following blog article.
BuildBuddy makes it easy to analyze build execution with the visualization dashboard for build events. We also stream local building events to BuildBuddy’s dashboard, allowing everybody to share this log. Since Mercari’s development team works in a hybrid remote/onsite environment, this made it possible to properly share the status with all developers.
We only upload the module artifacts to BuildBuddy when changes are merged to the main branch of our repository, in order to achieve a balance between stability of the cache, costs incurred for the CI, and the cache hit rate. Each developer only needs to build artifacts that are affected by the local changes.
We have yet to determine if we should allow using RBE for local builds as well; For the time being we have only enabled RBE on builds running on our CI servers, and developers can only view the build events in the local development.
The following diagram shows the workflow when building iOS applications.
As a reference, I have included a sample benchmark executing builds on my machine, along with my execution environment.
- Macbook Pro, Apple M1 Pro, 32GB RAM
- Debug Build
- 3 runs
- Local caches have been invalidated prior to each run
- “Full Build”: Executed builds with no cache support.
- “Remote Cache”: Builds with remote caching enabled.
- “RBE”: Builds with remote cache and RBE enabled.
(ref.) Debug Build speed for the Mercari app
|Build method||1st time||2nd time||3rd time|
|Full build||256.092 s||241.716 s||247.491 s|
|Remote cache||75.130 s||74.969 s||76.271 s|
|RBE||36.814 s||36.955 s||46.060 s|
So far we have discussed the merits of using Bazel for our builds, but there were potential issues that we faced. In the following sections we will discuss what these were, and show you how we overcame them.
Integration with Xcode
Integration of Bazel and Xcode was the biggest concern that we had prior to introducing Bazel. Xcode is notoriously hard to integrate with an external build system, as it is a peculiar IDE that tightly integrates with the build system on its own. It was particularly difficult to get search indexing and LLDB debugging to properly work with it.
When we say “integration” we mean to reproduce Xcode’s features using Bazel’s outputs. For these the following must be satisfied:
- Create Xcode projects using the Bazel build configuration files
- Execute Bazel instead of Xcode when building
- Place build artifacts into specific locations such that Xcode can use them for indexing
Only when the above are satisfied, can a developer use Bazel instead of Xcode. Some of other tools that integrate with Bazel are listed below
One major difference between Xcode and Bazel is that while Xcode always uses absolute paths, Bazel tries to use relative paths as much as possible. The main reason for this is that Bazel needs to be able to reproduce the same results when it has been run on a different machine or directory.
However, LLDB can only attach to an application when it is holding the information as absolute paths. In order to make this work we would need to use a custom
lldbinit file to map relative paths to absolute paths when attaching. The same can be said for indexing.
In the early stages of the project we tried a few combinations of the previously mentioned tools and custom scripts, and decided to use a hybrid solution using Tulsi and index-import. A post-build step using index-import to remap the indexing was necessary, because while Tulsi already came with absolute remapping, it didn’t handle the indexing. We have observed that Tulsi seems to work with Google’s internal projects, however it seems to falter when used with other external projects, especially those that use multiple programming languages. Luckily we were rewriting our application from scratch, and our first party source code was all written in Swift, which meant that we didn’t have to do much to make Tulsi work for us. We did, however, fork Tulsi and made some changes to make it easier for us to use.
Xcode has a setting called
IDEIndexShowLog enable indexing logging for analysis, which can be enable by executing the following command
defaults write com.apple.dt.Xcode IDEIndexShowLog -bool YES
This allowed us to check for errors when Tulsi, rules_apple, and rules_swift were updated. If there were no errors it meant that our indexing was working correctly.
This proved effective, but it was time consuming and prone to errors. We had been working on improving this integration.Since the beginning of 2022, developers from BuildBuddy has been actively developing a new Xcode integration tool called rules_xcodeproj, with contributions from other companies including Mercari. It has everything you need to integrate Bazel with Xcode without any custom solutions or hacks; Any iOS project using Bazel can use it.
After rules_xcodeproj stabilized, we have since migrated from Tulsi and we can say that our development environment is now very comfortable.
A common problem that other projects seem to face when migrating from Xcode to Bazel is that they need to support both tools. Some projects need to support Xcode even after the “migration”.
We were lucky in this regard as we were able to incorporate Bazel from the beginning because we were rewriting our application from scratch. Supporting Xcode while using Bazel as the main build system can hinder development for those developers who chose to work with Xcode. It’s also a relatively heavy burden for us to support both tools. For these reasons, we decided to support Bazel alone.
One of the downsides of increasing the level of modularization is that it slows the application startup.
This is not necessarily a problem of modularization, as it is often triggered by the increase in the number of Dynamic frameworks (.dylyb + .bundle files). When an application contains Dynamic frameworks, iOS dynamically links them, slowing down the startup time.
On the other hand, if you use a static library (
.a files), modules are linked during build time. It adds to the build time, but minimizes the effect on application startup times.
Recently there have been improvements in dyld performance and iOS caching, which made this startup time problem negligible in many cases. However, the same cannot be said for an application like ours that utilizes close to a 1000 modules. In such cases using static libraries is almost a necessity to improve user experience.
The standard way to build modules in Bazel is to build them as static libraries. Even when Dynamic frameworks are used, Bazel properly links them to avoid duplicate symbols and generally avoids bloating the application size. There is a chance for duplicate symbols to exist between an application and its App Extension, as they are bundled as independent binaries. In such cases, it is possible to use a Dynamic framework.
We have observed the startup times of the rewritten application through Firebase Performance Monitoring, and we are seeing that we were able to retain the startup speed comparable to the previous version, even after the aggressive modularization that we have performed.
We are satisfied with the results, as startup speed was something that we had been working on even before the rewrite. Considering the product size, we believe what we achieved is acceptable.
- Before the rewrite: 4.106.0 (136002)
- After the rewrite: 5.19.0 (207952)
Bazel does not come with a version-aware dependency manager. To be fair, Bazel is a build system, and not a dependency manager. Bazel has its own philosophy on dependency management, and it seems like it is an intentional choice.
CocoaPods, compared to Carthage and Swift Package Manager, has been one of the most dependable dependency managers within the iOS ecosystem. Initially we used CocoaPods as our dependency manager. For ease of use, we forked CocoaPods and made it possible to generate Bazel’s BUILD file to fetch third party dependencies when
pod install command was involved. This worked for a long time, but the cost of maintaining it seemed to overcome the merits as we used fewer third party dependencies. As we were also already using Renovate to automatically update dependencies, we decided that we no longer needed a dependency manager and deleted CocoaPods from our project.
Bazel can easily use and cache external libraries and tools. It only fetches these dependencies only when they are absolutely necessary.
When an external library supports Bazel builds, you only need to declare the dependency in the WORKSPACE file at the project root. For example, SwiftLint supports Bazel.
http_archive( name = "SwiftLint", sha256 = "7c454ff4abeeecdd9513f6293238a6d9f803b587eb93de147f9aa1be0d8337c4", url = "https://github.com/realm/SwiftLint/releases/download/0.49.1/bazel.tar.gz", ) load("@SwiftLint//bazel:repos.bzl", "swiftlint_repos") swiftlint_repos() load("@SwiftLint//bazel:deps.bzl", "swiftlint_deps") swiftlint_deps()
Even when a dependency does not support Bazel, you can write local BUILD files to build them. We do not need to worry about anything when a particular dependency does not natively support Bazel.
http_archive( name = "lottie-ios", build_file = "//Externals/ThirdParty:lottie-ios.BUILD", sha256 = "e168b05792d8af1830a73daee2f3b4f3a24b1ec512a949adf60fac6f0b6c99f5", strip_prefix = "lottie-ios-3.3.0", url = "https://github.com/airbnb/lottie-ios/archive/3.3.0.zip", )
swift_library( name = "Lottie", srcs = glob( ["Sources/**/*.swift"], exclude = ["Sources/Public/MacOS/**"], ), visibility = ["//visibility:public"], )
You would need to write these BUILD files for each package for most of the Swift / Objective-C packages as they do not support Bazel out of the box.
You may be able to leverage a tool to automatically convert podspecs from CocoaPods to Bazel build files, such as PodToBUILD, but in our case we manually define them as needed.
This is because we do not often add external packages, and that we think that the tradeoff between having to maintain and learn it, we can explicitly write out how these dependencies are built.
Bazel’s learning curve is one thing that we need to think about.
The Architect team can decide on the file structure within a project. For example, below is a sample of how files within a module should be placed.
Projects/Libraries/Logger/ ├── BUILD ├── Sources │ ├──Logger.swift └── Tests └──LoggerTests.swift
As long as developers stay with the above rule, developers can simply reuse common Bazel macros, such as the
library macro to specify the library target, and the
unit_test macro which specifies the test targets. This is actually very similar to how
Package.swift works in Swift Package Manager, thus eliminating the need for developers to learn anything new. In most cases, a developer only needs to think about the target name and its dependencies to create a new build target.
load( "//BazelExtensions:rules.bzl", "library", "unit_test", ) library( name = "Logger", deps = [ "//Projects/Libraries/FoundationPlus", ], ) unit_test(target = ":Logger")
Giving back to the community
Bazel is not a first-party build system for iOS. As such, it takes time for new changes to trickle down to Bazel when Xcode adds new features. It’s easy to fork Bazel, or apply patches to build rules as necessary, but we have tried hard to avoid forking or applying hacks to Bazel. We have been working with the community to generalize solutions and give them back to the community. For example, here are some of the contributions that we have made in the past:
--@build_bazel_rules_swift//swift:universal_toolsflag to provide compatibility between Intel / Apple silicon Macs. Pull-Request
apple_static_xcframework_importto officially support XCFrameworks. Pull-Request
- Made it possible to pass arguments to
test_argto allow setting the language used during visual regression testing. Pull-Request
We were able to greatly improve the performance and stability of our builds by using Bazel. We intend to keep optimizing our productivity by eliminating bottlenecks and taking modularization even further.
We are looking for developers who want to help improve our build environment, and developers who want to help us grow Mercari with us!