At Mercari, we do our best to keep our tools and libraries up to date. This leads to a need to also upgrade our hardware. This year, many employees got their hands on a new M1 computer from Apple.
This new computer is not an Intel based computer anymore but an arm64-based cpu.
This brings changes at many levels but the most important for us was compilation.
This article provides a quick overview of the compilation process and available architectures on Xcode with one goal in mind: Get a better understanding on what it means to compile for the M1.
We had one goal to reach but along the way, it was also a chance to learn more about compilation for iOS.
The goal: compiling and running an iOS app on M1 computers and ios 15 simulators.
After clearing out the technical differences that we encounter on M1 Computers, we will dive a little deeper into the Swift compilation process.
Compiling a project that can target multiple architectures is tricky. This document should give you a quick snapshot of the latest mechanisms related to compilation of an iOS/Mac OS X project with a main focus on the Swift language.
- M1 compilation for recent ios simulators
- Swift compiler overview
- The linker
- Debug Information
- Mach-O files and Fat Header
- The iOS Architectures
- Rosetta 2
- Project compilation on M1
- Tools used by Xcode
M1 Compilation for recent ios simulators
Why is it different ?
- The M1 is using an arm64 architecture.
- New simulators (>iOS 13.7) also have an arm64 executable
Historically, we had powerPC that were using a RISC architecture.
And up until the release of the M1 Chipset on computers, Apple was using the Intel x86_64 architecture.
M1 CPU is architecture is also called “Apple Silicon”
Up until Xcode 12 and iOS 14.x, all simulators were running as x86_64 binaries.
From that point, NEW simulators only (>iOS 13.7) have both the so-called x86_64-simulator and arm64-simulator architectures.
The arm64-simulator architecture being used only on the new M1 CPU and later.
That means that an Intel CPU computer will keep compiling and executing x86_64 simulator binaries only.
That also means that M1 CPU computers will actually compile and execute x86_64 AND arm64 simulator binaries DEPENDING on the simulator version you will be targeting.
A word about the M1 CPU
One of the biggest new features pointed out by Apple is the asymmetric cores used by the M1.
They insist a lot on choosing the right QoS for async tasks to improve user experience.
The Apple Silicon architecture has 2 types of cores
- P cores are called Performance cores (foreground tasks)
- E cores are called Efficiency cores (background tasks and energy efficient processing to optimize battery usage)
Swift Compiler Overview
In Xcode 9, each Swift file was compiled individually for each build because the compiler had to repeatedly parse all files to find declarations.
In Xcode 10, it groups files into common compilation process per code
- Shares work of parsing within a process
- Only repeat parsing across processes
CLANG as a library
The Swift Compiler includes a part of CLANG under the shape of a library so that swift code can directly call CLANG methods. This is a huge time saver because in other languages, It’s often necessary to create a stub for each external class we want to interact with. (That means that we would have to create a swift stub for each obj-c class/method we want to interact with).
Each project Target generates a swiftmodule file for this project.
A module includes all classes attributes and modules, It also includes names and types of private declarations (for debugging purposes).
It also includes the bodies of @inlinable functions.
The swiftmodule is somewhat similar to a header file.
A Test target will use the swiftmodule to check different types.
Project compilation flow
We tend to think about the compilation process as a series of steps.
We should think about it as a “dependencies directed graph”. This enables parallelism and optimisations.
Final task in building an executable Mach-O
Combines the output of all compiler invocations into a single file
- Moves and patches code generated by the compilers
- It does not create code
Takes two kinds of input files:
- Object files (.o)
- Libraries(.dylib, .tbd, .a)
What is an Object file?
- It’s non-executable code (Usually contains Assembly code + alpha)
What are libraries?
- They define symbols that are not built as part of your target (ex: Classes symbols that can be found in Google Map SDK…)
- Dylibs: Dynamic libraries (Mach-O files that exposes code and data fragments executables can use)
- TBDs: Text Based Dylib Stubs (Only contains symbols, very simple)
‘.a’ files are static libraries
The linker checks all symbols of all .o files and resolves them to be replaced by actual code. This will result in a Mach-O binary. It can be an executable or a library.
Dsym bundles contain information about debugging.
Debugging info formats
There are 3 main types of format
- Function starts
- Mainly used by debugger
- Only contains the address of the function
- Structs instead of addresses
- Contains names and addresses
- Contains symbols types
- Direct: fully defined in project and libraries
- Indirect: provided by dependencies (Such as print())
- Highly detailed
- debug_info (raw debug data)
- debug_abbrev (structure of the raw data)
- debug_line (file names, line numbers)
- Adds relationship information
- Primarily found in dSYM bundles
- Highly detailed
We will be mostly interested in the DWARF format as it’s what is included in the dSYM bundle that we need to use to debug a production crash or a library crash.
Details about DWARF
- Static libraries and object files can also contain DWARF
- DWARF in a dSYM bundle are in binary format, not text
- DWARF is limited to 4GB per binary
Mach-O files and Fat Header
Mach-O is a data format. It can represent a set of multiple architecture binaries.
It can be a library or an executable file.
A simple representation of the structure of a Mach-O file
The Magic Number or “Magic” is present to declare that it’s a FAT HEADER rather than a single-architecture Mach-o-file.
More in detail about the header:
- 32 bits for the Magic (0xcafebabe)
- 32 bits for the number of architectures in contained in the file
More detail about Mach-O
- For Xcode generated Mach-O files, values are using a big endian encoding
Problems related to Mach-O
If we have for example a Mach-O file containing:
- Arm64 slice for iOS
- Arm64 slice for iOS simulator (Mac OS X arm64 on M1)
We end up having an error during the Linker phase.
Building for iOS Simulator, but the linked and embedded framework ‘MyFramework.framework’ was built for iOS + iOS Simulator.
In order to solve this, we need to use xcframeworks (This becomes necessary from Xcode 12.2 and later).
Introduced with Xcode 11
Proposed by Apple to distribute pre-compiled libraries as an alternative to Swift packages that forces devs to provide open source code.
Also proposed to fix the link-time issue encountered with Mach-O files
XCFramework is just a folder structure.
How to create an xcframework?
The iOS Architectures
For iOS devices we have:
- “An application build with armv7 will run on all current iOS devices”
- This is using 32 bits instructions
- Available in iOS 7 or later
- iPhone 5S was the first iphone to propose an ARM64 architecture
- This is using 64 bits instructions
For Mac OS X simulation for iOS we have:
- Simulators targeting iOS 13.7 or less (on M1 it’s using Rosetta to actually run x86_64 binaries on arm64 architecture)
- All more recent simulators running on Intel based CPU
- Simulators targeting >iOS 13.7
- Old 32 bits Mac OS X machines
And there is also Mac Catalyst that needs x86_64 and arm64
Can also be summarize like this:
Switch arch Case armv7 Case arm64 Switch target Case iOS Case iOS Simulator Case Catalyst Case x86_64 Switch target Case iOS Simulator Case Catalyst Case x386
Architecture and devices compatibility
One thing to notice is: If you can’t build your arm64 binary for some reason but can build the armv7 version of it, It will be able to run on recent devices as well.
Apple provides documentation about this topic and you can find compatibility tables:
Rosetta 2 enables a Mac with Apple silicon to use apps built for a Mac with an Intel processor.
- You can software on Rosetta by going to the “get info” menu and checking the “Open using Rosetta”
- Using Rosetta will run your software as if you were using it on an Intel-based Mac
- Some software don’t run with Rosetta. (ex: Virtualization software)
- More details: https://support.apple.com/en-us/HT211861
Project compilation on M1
Our goal being to compile the project on an M1 computer, we need to understand the requirements for this.
It can be summarized as followed:
- The project build architectures should be set to default (armv7, arm64)
- Xcode will automatically build a x86_64-sim or arm64-sim depending on the target simulator architecture
- All libraries should either be able to compile to ALL target architectures (including arm64 and x86_64 simulator binaries in our case)
- OR at least libraries should provide binaries for ALL target architectures
- Using Rosetta for Xcode on M1 will be the same as compiling on an Intel based computer and will be slower than a native support.
Important build settings
It seems important to point out 2 build settings to really understand what they do and when to use them.
- Build Active Architecture only
- Exclude Architecture
Note: Valid Architectures has been removed in Xcode 12 and should not be used anymore.
Build Active Architecture only
This will sound obvious but this setting when set to TRUE will only build the targeted architecture. For example, if we build the app for iOS 14.5 Simulator on the M1, it will build the library for arm64-sim only.
What are the consequences?
- The build is faster
- You will have to clean your dependencies cache if you change to x86_64 simulator (ex: iOS 13.7 simulator) otherwise, you will get a compilation error saying that the architecture is missing.
On a M1 computer, for debugging, we can safely say that it’s better to keep it on. But we can also assume that if for some reason we need to switch between both architectures in our tests OR if the dependencies cache is not rebuilt between 2 UITests (one on 13.7 and one on 14.5 for example), it will trigger a compilation (linker) error.
For release, Apple ideally needs all binaries for all architectures. Then they can recompile, link again for any device target they want.
This settings can be used to work around some compilation issues as seen as follow or exclude a specific architecture when distributing a library for example (If your code is not ready for a specific architecture, it can be useful)
A use case, not recommended though, can be to exclude arm64 architecture when trying to run your application on an iOS 14.5 Simulator on M1.
- This will force Xcode to generate a x86_64-sim executable
- The simulator will be running on Rosetta
- The libraries will be compiled for x86_64 architecture and not arm64 which can result in having no compilation error
- That can be used as a temporary “fix” to be able to continue working on a project but the ultimate goal is to run all your code on arm64 only.
When code depends on the Architecture
Even though it’s not recommended for obvious reasons (this could be a source of unexpected behavior in the future), we can have specific code depending on the architectures and targets.
Some examples (After Swift 4.1):
#if targetEnvironment(simulator) // your simulator code #else // your real device code #endif
More examples of what could be found before Swift 4.1:
Detect the watchOS simulator
#if (arch(i386) || arch(x86_64)) && os(watchOS) ... #endif
Detect the tvOS simulator
#if (arch(i386) || arch(x86_64)) && os(tvOS) ... #endif
Or, even, detect any simulator
#if (arch(i386) || arch(x86_64)) && (os(iOS) || os(watchOS) || os(tvOS)) ... #endif
Check a library package
A quick example of a binary library that is missing the arm64 for simulator architecture.
First of all, please note that the library doesn’t provide source code but only pre-compiled Mach-O files.
The example is Google Map SDK.
Simply use the lipo command to display the different architectures embedded in the Mach-O fat library.
lipo -detailed_info GoogleMapsBase Fat header in: GoogleMapsBase fat_magic 0xcafebabe nfat_arch 4 architecture i386 cputype CPU_TYPE_I386 cpusubtype CPU_SUBTYPE_I386_ALL capabilities 0x0 offset 4096 size 4036788 align 2^12 (4096) architecture x86_64 cputype CPU_TYPE_X86_64 cpusubtype CPU_SUBTYPE_X86_64_ALL capabilities 0x0 offset 4042752 size 4408596 align 2^12 (4096) architecture armv7 cputype CPU_TYPE_ARM cpusubtype CPU_SUBTYPE_ARM_V7 capabilities 0x0 offset 8454144 size 15918048 align 2^14 (16384) architecture arm64 cputype CPU_TYPE_ARM64 cpusubtype CPU_SUBTYPE_ARM64_ALL capabilities 0x0 offset 24379392 size 16693260 align 2^14 (16384)
You can see there are 4 architectures slices here: x386, x86_64, armv7, arm64
For some reason, they provide the 32bits x386 binary that can be used by old intel processors.
Then they provide the x86_64 binary which is used for the Intel-based architecture simulators.
Finally, to ensure that their framework works properly on iPhone 5S and later devices, they provide binaries for armv7 and arm64.
What is missing?
- A second arm64 build for the M1 arm64 simulators.
Tools used by Xcode
Tools related to compilation and debugging:
- atos takes an address as an input and uses information in the dSYM file to find the exact file and line number that caused the problem in our app
- Returns all the symbols from a library
- Returns different details from a library
- Returns all kind of info related to symbols
- Prints human-readable debug information
- manipulate archived DWARF debug symbol files
- Gives you an indication of the VM used by a process
Behind the Scenes of the Xcode Build Process – WWDC18 – Videos – Apple Developer
Symbolication: Beyond the basics – WWDC21 – Videos – Apple Developer
App Startup Time: Past, Present, and Future – WWDC17 – Videos – Apple Developer
Universal Binaries: inside Fat Headers
Optimize for Apple Silicon with performance and efficiency cores
Xcode and XCFrameworks