2025/10/08

Behind the Infrastructure Powering Global Expansion

Author:: yanolab

, 2025/10/08

Behind the Infrastructure Powering Global Expansion

I’m yanolab, working as an Architect and SRE in Cross Border (XB) Engineering.
On the first day of this blog series, we introduced Rebuilding App and Foundation for Global Expansion. In this article, titled "Behind the Scenes of Infrastructure Supporting Global Expansion," I’d like to delve a bit deeper into the architecture, frameworks, and initiatives of our backend systems.

Background

Mercari has long adopted and operated a Microservice architecture, investing in its ecosystem. We have Microservice templates called echo services, an SDK for developing Microservices in Go, Terraform modules called starter kits that consolidate basic infrastructure configurations, and an SDK that abstracts Kubernetes configurations to manage Deployments with minimal code. Additionally, when releasing Microservices, there’s a process called Production Readiness Check (PRC), and newly developed products or Microservices must pass this checklist. While these ecosystems and processes have matured, the increasingly complex ecosystem has raised the learning cost, and the bloated PRC has meant that launching a single Microservice now takes at least three months. Moreover, when launching new businesses, despite starting with a small team, we often need to launch dozens of Microservices. In such cases, the effort to spend 3 months per Microservice is unrealistic, and Mercari’s recent new businesses have increasingly adopted Monolith-like approaches. (ref: Mercari Hallo’s Tech Stack and Why We Chose It)
In rebuilding infrastructure for global expansion, we anticipate eventually reaching the same scale as the current Mercari Marketplace. Therefore, rather than a simple Monolith, we’ve designed and are operating a special Modular Monolith that maximizes the use of our existing ecosystem while enabling Microservice-like operations.

Modular Monolith with Flexible Deployment

Mercari’s ecosystem, designed for Microservices, is fundamentally based on one repository per service and doesn’t assume large-scale, complex system configurations. For example, our CI/CD assumes one binary, one container, and one Deployment. When deviating from this environment, the implementation side needs to create and maintain custom workflows. To avoid the cost of continuous independent maintenance, the Cross Border team adheres to this policy while enabling Microservice-like operations to distribute operational load as the business grows in the future. The system is compiled into a single binary, but modules can be enabled or disabled through configuration files. Additionally, by defining interfaces between modules with Protocol Buffers and using gRPC for communication, we’ve increased operational flexibility without being constrained to communication within the same instance. This allows us to use the existing CI build system as one binary and one container while enabling Microservice-like operations where modules can be turned on and off through configuration files and communication partners between modules can be arbitrarily configured. Furthermore, by using Protocol Buffers for interfaces between modules, we’ve increased module independence while enabling teams to collaborate on module development from the interface design stage. (Fig. 1)

Fig.1 Modular Monolith with Flexible Deployments

AlloyDB is used as the database for the new infrastructure. In Mercari’s past Monolith, a shared database was used across the entire system, with no restrictions on table joins or permissions across domains. As a result, interdependencies between domains increased as the service grew, and operational costs escalated. In contrast, when migrating to Microservices, Spanner and CloudSQL were adopted by many services and teams. Having each service maintain its own database independently was an excellent choice in terms of domain and service independence, ownership, and maintenance. However, from a cost perspective, it was inefficient for each team to have its own database and maintain an HA configuration for stable operation even with low request volumes, resulting in particularly wasteful costs for services with few requests. Therefore, the Cross Border team decided to use the same cluster as much as possible to save costs, but separate service accounts for each module to restrict accessible databases, and divide databases on a per-module basis. This allows us to keep costs down while preparing for future division and scaling. (Fig. 2)

Fig.2 DB Isolation

Traditionally, Mercari has configured Microservices through environment variables, but with a Monolith, we anticipated that configurations would become extremely numerous and managing configurations across environments would become complex. Therefore, we adopted CUE lang for configuration files, enabling default configurations to be managed from a single source and allowing only values that differ per environment—such as development or production—to be managed as differences. These configuration files are bundled into containers during the container build process, and depending on the environment, the appropriate configuration is automatically used—local configuration for local environments, and corresponding configurations for development or production environments. Additionally, by allowing the standard configuration to be overridden with CUE/YAML at runtime, we’ve also made it possible to apply different configurations for each Deployment. (Fig. 3)

Fig. 3 Difference management of config

For example, we define the standard configurations for development and production environments as the default config as shown below (Fig. 4). In this case, the ProductInventory application in the Product module uses localhost as the address for the Search module.


#GRPCClientConfigSpec: {
    address: string | *"localhost:(#HTTPPort)"
    timeout: =~"^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$" | int | *"3s"
    retry:   int & >=0 | *3
}

components:
    "layers/tire2/product/applications/productinventory":
        enabled: bool | *false
        search_module: #GRPCClientConfigSpec
    "layers/tire3/search/applications/productsearch":
        enabled: bool | *false
    ...

Fig. 4 Common part of development and production

Suppose we define the common configuration for the development environment as shown below (Fig. 5). In this case, all features are enabled both in the GKE environment, which is part of the development environment, and in the local environment, where all modules use the modules on localhost.


components:
    "layers/tire2/product/application/productinventory":
        enable: true
    "layers/tire3/search/applications/productsearch":
        enabled: true
    ...

Fig. 5 Development specific configuration（Enabled all of modules）

When separating GKE Deployments in the production environment, we mount a ConfigMap as YAML separately from what’s bundled in the container and load it. For example, by setting the connection destination of the Inventory application in the Product module of DeploymentA to DeploymentB (Fig. 6), and enabling only the ProductSearch application of the Search module in DeploymentB (Fig. 7), it becomes possible to operate only the Search module independently.


components:
    "layers/tire2/product/applications/productinventory":
        enable: true
        search_module:
            address: deploymentB.xxxx.svc.local
    "layers/tire3/search/applications/productsearch":
        enable: false
    ...

Fig. 6 The Search module used by the Product module can be switched to a different Deployment


components:
    "layers/tire2/product/applications/productinventory":
        enable: false
    "layers/tire3/search/applications/productsearch":
        enable: true
    ...

Fig. 7 Deployment with only the Search module enabled

This flexible architecture enables operation as a single binary in local development and development environments, while allowing modules to be appropriately separated and operated in the production environment. This is particularly powerful for local development, eliminating the challenge of Microservice development where you need to prepare an execution environment including dependent Microservices, thus dramatically improving the efficiency of development environment setup and maintenance. However, in this infrastructure rebuild, we’re not replacing all Microservices, and dependencies on existing Mercari Microservices still exist. To handle these dependencies, we use a product called mirrord to connect from the local environment to the remote Kubernetes environment for development. We also use a product called air, which enables dynamic reloading of changes, achieving a modern development environment similar to web application development.

Adapting to Change with a Monorepo

In Mercari’s Microservices, we create a repository for each service and operate the Protocol Buffer definitions, infrastructure management using Terraform, and Kubernetes deployment environment repositories as monorepos shared by everyone. While this approach is effective, being different from the main repository requires moving between repositories. The frequent occurrence of this context switching is extremely stressful for developers. Additionally, automation across repositories not only takes longer to process due to individually running CIs, but when issues occur, it’s difficult to understand where and what is happening, which worsens the developer experience. In this infrastructure rebuild, to review these developer experiences, we’ve also reconsidered this structure and are attempting to consolidate the Backend project, Frontend project, Protobuf definitions, and Terraform in one place so that development can be completed within a monorepo as much as possible. (Only Kubernetes deployment uses the existing monorepo due to ecosystem constraints.)

By clearly defining boundaries with Modular Monolith while managing not only Backend projects but also Frontend projects in a monorepo, we’re making it easier to contribute across languages and roles while aligning applications, architecture, and frameworks. In terms of maintenance as well, we believe efficiency is high since we only need to maintain one location for scripts, workflows, CI, etc. At Mercari, we had long been unable to visualize organizational and team productivity, and accurately measuring developer productivity was a challenge. Since 2024, we’ve introduced DX with the aim of visualizing and improving developer productivity. DX combines qualitative data from surveys with quantitative data such as productivity-related metrics from GitHub to visualize four aspects: efficiency, speed, quality, and novelty. We found that the monorepo approach produced better results in these values compared to Mercari’s overall scores.

One slightly unique aspect of the monorepo we built is that we use Terraform and CUE lang for infrastructure management (the traditional tf format is also available). In CI, we convert from CUE to JSON and apply it. By defining infrastructure in CUE, environment construction with difference awareness becomes possible, similar to the configuration management of the Modular Monolith introduced above. Since CUE can be merged and used with YAML and JSON, we feel it’s extremely effective for automation. Going forward, we have the ambition to leverage the advantage of having all monorepo data in the same repository and work on Framework defined Infrastructure that automatically generates infrastructure configuration files from Modular Monolith configurations and frameworks. (Fig. 8)

Fig. 8 Framework Defined Infrastructure

Approach to Increasingly Complex Domains and Dependencies

Currently, Mercari has several hundreds of Microservices related to the Marketplace business as well as Merpay. These services are not only divided more finely than necessary and interdependent with each other, making maintenance difficult, but they also make it extremely challenging to determine which Microservice should receive new functionality, which Microservice’s features can be utilized, or whether a new Microservice should be created in the first place when trying to create new features. Therefore, as Cross Border rebuilds the Marketplace infrastructure from scratch, we’ve been proceeding while organizing domains and roles by introducing the concept of Tiers and dependency maps, focusing on specific functions like the Like service, and re-consolidating services that were divided too small—such as bringing them together into a Social module—into reasonably large domains.

In this Tier concept, we’ve divided roles into five layers—BFF (Backend for Frontend)/Gateway, Tier 1, Tier 2, …Tier 4—and added roles and restrictions for each layer.

BFF/Gateway Layer

BFF is well known, but this layer defines APIs optimized for Mobile and Web screens, and all requests are sent through the BFF before being passed to lower layers. Language and currency conversion based on customers is also handled by this layer. It is jointly owned and maintained by Mobile engineers, Web engineers, and Backend engineers.

Tier 1

Primarily responsible for request orchestration and business flows. The responsibility of Tier 1 is to build business processes using modules in Tier 2 and below. The image is that it builds processes using various Marketplace features, so it’s the area responsible for horizontal processing.

Tier 2

Primarily a domain-specific layer that realizes Marketplace’s core functions. This includes modules like Product and Order. The image is that it’s the area responsible for vertical processing specific to the relevant domain.

Tier 3

This layer basically provides more generic functions that don’t depend on the Marketplace. This includes Search and Notification.

Tier 4

This layer is somewhat special and provides modules that must meet specific requirements or functions that are difficult to belong to Tiers 1-3. We place modules that exclusively handle personal information with different security and operational requirements from other modules in this layer.

We’ve imposed the constraint that requests always flow from top to bottom and communication between modules in the same Tier is prohibited. However, we’ve established a rule that when accessing from an upper Tier to a lower Tier, intermediate Tiers can be skipped, and access from BFF to Notification is permitted. (Fig. 9) Databases are also separated by module, and it’s not possible to span transactions across modules. These rules greatly increase module independence while preventing the proliferation of small modules. If communication between modules in the same Tier becomes necessary, it indicates that the domains of those modules are very similar, and we view it as a good signal to review domain boundaries.

Fig. 9 Tier Concept

The infrastructure rebuild has only just begun, but by utilizing well-defined and stable service groups such as Payment and IdP while reorganizing and implementing Marketplace domains using this design methodology, we’ve been able to keep it to 18 modules as of October 2025.

Current Challenges

Currently, to enable deployment on a per-module basis, we manage versions per module in files and detect version upgrades for each module by incrementing those versions at release time. However, this method is incompatible with GitHub Flow, which uses the main branch for releases, and there’s a risk of unintended changes being included in releases. We’re currently working through trial and error to solve this problem.

Future Developments

In these times when AI-driven development is becoming mainstream, quickly launching new businesses is necessary to secure competitive advantage. The Cross Border team’s Monorepo and Modular Monolith approach introduced here has a reasonably high initial construction cost, so we’re working with the Platform team to make it easier and faster to build so it can be applied to Mercari’s future new businesses. If there’s an opportunity somewhere down the line, I’d like to write another article about these results.