2025/12/03

Shops Monorepo Five Years Later: A Tale of Bazel and Cursor

Author:: MoreiraJasper

, 2025/12/03

Shops Monorepo Five Years Later: A Tale of Bazel and Cursor

This post is for Day 3 of the Mercari Advent Calendar 2025.

Introduction

Hi, I’m Jazz from the Mercari Shops Enabling team. Our team handles a variety of responsibilities in Mercari Shops, ranging from backend, to observability, all the way to CI/CD. Our mission is to ensure the engineers who work on Mercari features have a great technical foundation and excellent developer experience.

Five years ago, Mercari Shops adopted a monorepo structure using Bazel on top of a microservices architecture. At the time, we believed this stack would support our early product phase, enabling fast iteration towards a usable product. Today, we believe the monorepo is still the right choice, but maintaining it has required us to address significant technical debt.

Over time, our setup became overly complex. We faced conflicting dependencies and unstable fixes that made standard tasks, like upgrading the Go version, difficult. These difficulties had their own consequences, as the usage of certain libraries, including internal Mercari standard ones, was blocked due to Bazel conflicts. Furthermore, while our frontend, backend, and protocol buffers lived in the same repository, they were effectively isolated by incompatible build systems.

In this post, I will share how we unified our build processes and resolved years of technical debt. I will also explain an unexpected benefit of this cleanup: our standardized monorepo became highly compatible with AI tools. This allowed us to onboard tools like Cursor and Claude Code quickly and see an immediate productivity boost.

If you are managing build system technical debt, considering a monorepo, or looking for practical examples of how AI integrates with large codebases, this article is for you.

A Quick Recap: Why Mercari Shops Chose a Monorepo

Back in 2021, when we were building Mercari Shops, we made a specific architectural bet. Unlike the main Mercari marketplace app, which was migrating from a monolith to microservices, Shops started as microservices from day one, which allowed a fast rate of delivery of features.

To manage the complexity of multiple services sharing code, we chose a monorepo powered by Bazel.

The Design Goals:

Single Source of Truth: Understand the entire service from one repo.
Shared Patterns: Consistency across Go (backend), Python (ML), and Protocol Buffers.
Atomic Changes: Make global changes apply to everything at once.

For the first few years, this worked well. But as the team grew and deadlines pressed, entropy set in.

You can read an in depth overview of the Mercari Shops initial architecture decisions in this (Japanese language) blog post: Mercari Shops Tech Stack.

The Drift: When a Healthy Monorepo Decays

By year 4, we were facing a significant problem: Toolchain Decay.

While the application code was healthy, the build configuration holding it together had become brittle. We saw classic symptoms:

Dependency Conflicts: We were stuck on older versions of Go because different parts of the monorepo had conflicting requirements. Updating one Bazel module often broke another.
The "Hack" Layer: Urgent fixes often turned into permanent hacks. We had custom shell scripts wrapped in Bazel rules and legacy flags that nobody fully remembered.
The "Bus Factor" in CI: There were code paths in our CI pipelines that only one or two people dared to touch. A simple task like "bump the Go version" could spiral into a multi-week drama of fighting conflicting Bazel modules.

The repository had become a "maze." New joiners faced a steep learning curve just to run tests locally, and developers were afraid to touch build files lest they break a service they didn’t own. Library dependencies stopped being updated, and the Go toolchain remained stuck on version 1.19, while the current version was already 1.24.

The Nightmare: An Unpredictable Toolchain Unfit for a Crisis

The toolchain decay started to become unmanageable once the builds became unpredictable. Due to heavy reliance on the rules_docker module, and its container_run_and_commit_layer rule, which is not a hermetic, repeatable way of building containers, the success build rate for any microservice dropped to below 50%. Bug reports on the fact that the rule was buggy went unanswered. Mercari Shops developers were forced to retrigger their builds multiple times until their change actually completed its full CI/CD cycle.

The result of this unreliability was as expected: there were a few near misses, where the incident remediation was delayed due to the need to continuously retrigger the build until it was finally completed successfully. Features were delayed because adding new dependencies caused the build to fail, without any meaningful feedback from the tooling on how to fix it.

At this point, the Bazel build system had become a serious threat.

The Renovation: Modernizing Our Toolchain

We decided that we couldn’t "move fast" (one of Mercari’s core values) if our legs were tied together by technical debt. We launched a focused initiative to clean up the repository.

1. Inventory and Mapping

Before touching anything, we had to figure out what we actually had. We scanned the repo to map the current state of the monorepo:

Which services used which language versions?
Which code was still in use, and which code was abandoned and not deployed anymore?
Where were the custom hacks hiding?

We needed to move from "it works, sometimes, if you do this" to "it works, always".

We found out that:

100% of the Python code in the monorepo was abandoned, and we didn’t need to keep it.
There were more than 120 different Github tasks configured in the monorepo, covering build, deployment, synchronization of settings, tests, report generation, and database management. We found that more than 20 of these tasks were completely abandoned, and were never executed.
There were more than 70 Go backend microservices, and 6 Typescript frontend services.
We couldn’t update the dependencies of the Go microservices, as they conflicted with the older versions of Bazel modules that we were unable to update.
Several of the Bazel modules we used were outdated, and some of them abandoned.
The custom hacks we had in our repo ranged from scripts to fix misconfigured automations where the script corrected the outcome of the automation, all the way to patches to libraries to avoid build errors.

2. Getting Bazel to default

The first challenge in the cleanup was bringing the Bazel setup back to a default mode, without scripts, hacks, and patches. We had tried to untangle the heavily hacked setup by upgrading specific modules, one at a time, but the patchwork of hacks made that impossible: changing one version broke something unrelated.

The only option we had was to rewrite the build system from scratch, using up to date versions of Bazel and its modules, so that it would build the code that we had live today, and not necessarily conform to the history that the old setup accumulated.

3. Migrating from rules_docker to rules_oci

A major part of the cleanup was ripping out rules_docker. This ruleset was effectively unmaintained and became a liability. We migrated to rules_oci, the modern standard for building container images in Bazel.

rules_oci is faster, standard-compliant, and separates the build from the container runtime. It is well maintained, and we are able to continuously update our project to the latest version of this module without running into issues. Their documentation includes a migration guide that provides meaningful advice on performing the migration, which was helpful to understand the differences between rules_docker and rules_oci, even if we were rebuilding the tooling from scratch.

Our builds became deterministic and significantly faster. We could finally use standard tools to sign and verify images. As an added benefit, we were able to switch to distroless images, which reduced the risk surface of our deployments.

4. The big PR and it release

Rebuilding the tooling from scratch had a downside: we couldn’t do a gradual update of the repo, we needed to do it in a single pull request. After three months of intense work, we finally merged the big PR. Some interesting numbers:

It had 118 commits
It changed 757 files
It added 37,570 lines and removed 25,978 lines of code

We had to review and approve it using github command line tools because the web interface froze due to the size of the pull request. While it was nerve wrecking to merge such a large change, the migration was a success, and smaller issues, such as adjusting the name of containers, were easily solved now that we had a tooling that gave us meaningful feedback.

The Unexpected Finding: AI-Readability

This is where the story gets interesting.

Around the time we finished the cleanup, AI coding tools like Cursor and Claude Code started becoming mainstream. We, like many teams, tried them out. The difference in their performance before and after the cleanup was night and day.

Why "Standard" Code is AI Fuel

Before the cleanup, when we asked an AI agent to "add a new endpoint," it would fail. It couldn’t understand our custom hacks, our weird directory structures, or why rules_docker was behaving strangely. The AI would hallucinate standard Bazel rules that didn’t exist in our custom setup.

After the cleanup, the repo was "boring"—and AI loves boring.

Because we were now using vanilla rules_oci and standard Go rules:

Context Discovery: The AI tools could traverse the project and accurately map the dependency graph, and the relation between different parts of the project
Correct Code Generation: When Cursor generated code, it used the standard patterns for both the feature and the build system, and for the first time, those patterns actually worked in our repo. This predictability increased our engineers’ confidence in using AI tools.

Success Stories: Humans + AI

1. The Junior DevOps Engineer

A new team member joined with strong application skills but very little experience with Bazel or CI/CD pipelines. In the "old" world, assigning them a CI task would have been a recipe for frustration. They would have to spend days learning the basics of Bazel, understanding how the scripts and hacks influenced the outcomes of the build system, and engage in a long cycle of trial and error to complete their tasks.

Instead, they used Cursor. They asked: "My service x has a race condition. How can I enable the golang race detector in my Bazel build?"

Because the repo used standard implementations, the AI quickly provided him with the correct way of enabling the race detector. Then they ran the build with the detector enabled, found out the issue, and relied on Cursor to find a solution for the issue.

Since the build system was now reliable and repeatable, they were able to quickly validate the solution with confidence.

2. AI as a Discovery Tool

We found that AI wasn’t just for writing code; it was for finding it. For example, we can ask "Find the flow from API endpoint Y down to the database writes." with a high rate of success. Mapping out complex business rules became a matter of requesting an AI agent to build a UML flow diagram.

With a cleaned-up architecture, these queries returned useful, high-coverage answers. We could use AI to sketch large refactors across the monorepo, then execute them step by step.

Lessons Learned

The most important lesson we learned is that a build system isn’t "set and forget." It requires ownership. If you don’t schedule regular hygiene work for your infra, you will eventually pay 10x the cost in slow upgrades and developer frustration.

Trying to "throw AI" at a messy, hacked-together build system just amplifies the mess. Clean code is essential for AI maintainability.

By paying down our technical debt and embracing standard tools, we didn’t just fix our build times—we opened the door for our team to build faster and smarter with AI. The monorepo is no longer a burden; it is once again a competitive advantage.

Tomorrow’s article will be by whygee about enhancing DX through Mercari’s Unified Platform Interface. Stay tuned!