This post is for Day 18 of Mercari Advent Calendar 2023, brought to you by Jon from the Mercari Web Platform team.
Tomorrow’s article will be by Andre and how Mercari uses Language Learning Models to improve Mercari products!
This article is about why the Mercari Web Platform team decided to invest in a frontend monorepo, what we achieved in a year of development, and some of the exciting challenges we’re tackling next.
Repositories in Mercari
Repository management in Mercari varies by team and project. In accordance with our core value “Be a Pro”, developers have autonomy when deciding how and where to write code. Some repositories are segmented by application, and some by team. There are large, single app repositories that have contributions from dozens of developers spanning multiple teams. Other repositories are small, single packages that have one main contributor. Most repository variations have compelling arguments, some of which include non-code related factors. Organizational structure, team size, preferred programming languages, developer bias, project scope, and time constraints can influence a project’s repository design philosophy.
When looking at open source frontend repositories on Github, they are usually scoped to a single package or application. This makes sense when building small to medium sized, isolated components. However, when code bases grow and start to involve multiple teams with dozens of contributors, dependencies, and cross-cutting concerns, it becomes more and more difficult to manage.
One of the inevitable side-effects of Mercari developers moving fast and shipping lots of code, means that code ends up being duplicated and fragmented. It’s awesome to have developers from different teams contribute solutions, and we’ve found that fewer repositories tends to nurture collaboration and discoverability. As Microsoft’s rush team highlights, The emergent principle becomes "one Git repo per team", or even better, "as few Git repos as possible to get the job done".
Web Platform Monorepo
On the Mercari Web Platform Team, we ideally want to share our code with as many frontend teams as possible. The nature of our team’s role implies reusable solutions and the need for discoverability. The latter being a particular pain point for our team. After all, our products are meaningless if there are no consumers! While frontend applications, packages, and Github workflows are only part of our team’s deliverables, we identified some initial code that would benefit from existing within a single repository.
Lighthouse CI Runner
One of our first projects was a tool that runs Google Lighthouse audits in CI/CD against pull-request deployments. This audit allows teams to understand how each code change affects client side metrics, accessibility features, and potential user experience degradations. We wanted to share this across any repository in the organization without requiring consumers to deploy any infrastructure. We were able to modularize a lot of the code required for this tool into individual npm packages that handled small domains such as reading file and directory data, writing to disk, accessing Google Cloud Storage, and posting messages to slack.
Initial Benefits
The upfront cost of creating shared packages, figuring out how to version and publish them, and how to manage local and CI/CD environments in a monorepo was not negligible. While it would have been initially faster to create everything as a single app, we were soon able to reap the benefits of these shared packages when creating our subsequent Code Coverage, package statistics, and analytics tools.
Having our shared libraries inside of a single repository made it easy for new developers to quickly search and find existing code, without having to ask around in Slack if someone had already made something similar.
Using Yarn’s workspace:*
syntax, developers could also quickly import libraries into new projects, edit library source code, and have it reflected in their app without having to manage linking and installing across repositories.
For third-party packages, Yarn’s prefer reuse setting and Plug n’ Play module resolution allowed us to reduce version mismatches across libraries, prevent new versions from being added incorrectly, and eliminate accidental phantom dependencies. Before utilizing PnP, we struggled enforcing project encapsulation. It was easy for builds to pass by mistake if a dependency was included in the monorepo and hoisted to the root node_modules, which made it available to all packages in certain contexts.
When combined with Turbo, we found Yarn to be exceptionally good at filtering workspaces (packages or applications) based on commit ranges, workspace name, or folder directory. This allowed us to keep pull-request triggered CI/CD workflow times short while still maintaining decent test and build coverage.
After thousands of commits, a year of development, and constant iteration, we’ve successfully grown our monorepo to include over 30 npm packages, 5 node applications, and 30 Github shareable actions. We’ve landed on a TypeScript tech stack incorporating Yarn’s Plug N’ Play Zero-installs, Turbo repo, Git-LFS, Next.js, and Changesets.
The Future
We’re happy with the progress we’ve made, but there is still a lot of low-hanging fruit for us to reach for! The average time of our pull-request CI/CD workflow is ~6 minutes, while merging to main can take up to ~15 minutes. Refactoring our turbo task definitions and utilizing a turbo cache can significantly improve our build times. Migrating our Docker builds from GCP Cloudbuild to using our in-house ArkCI Github runners will significantly reduce our app build time. In particular, using Yarn’s PnP module resolution strategy has been exciting (and also frustrating). Incredibly, it has already reduced our install times from minutes to seconds in local and CI. Additionally, by ironing-out platform differences and unifying developer git settings, we can remove the need to install completely!
Some of our more challenging tasks have been aligning configurations between build tools for both inside and outside the monorepo. Early on we decided to write and output our code in ESM wherever possible, which has led to many hard to diagnose CJS vs. ESM transpilation issues. Prioritizing simplicity in regards to build tools is a hard task in a frontend ecosystem that relies on shifting standards, complicated tooling, and intricate tooling interactions.
The monorepo architecture has allowed us to make huge, incremental changes to a large codebase within days instead of months. It has also allowed us to have a single entry point for onboarding new contributors, rather than requiring people to learn multiple tech stacks across dozens of repositories. Iterating over our CI/CD pipelines, release strategy, versioning, and documentation has put us in a good position to support larger projects, more teams, and quickly create standardized “Golden Paths” for common developer needs.
If working in a fun, highly autonomous environment while solving modern and impactful frontend platform level challenges sounds interesting, we’re currently hiring and would love to have a chat! Please take a look here, https://apply.workable.com/mercari/j/001A5ADF0F