2025/10/09

From Local to Global: Building Seamless B2C Product Integration at Mercari

Author:: ahsun

, 2025/10/09

From Local to Global: Building Seamless B2C Product Integration at Mercari

I am Ahsun, working as a Software Engineer @Cross Border (XB) Engineering. In this article, titled "From Local to Global: Building Seamless B2C Product Integration at Mercari," I’d like to delve a bit deeper into how we architected a robust, scalable product synchronization system that handles both real-time updates and bulk data migrations between Mercari Shops System and Global Foundation. We’ll dive into the key challenges we faced, critical design decisions we made, and learnings that shaped our iterative approach to building a production-ready sync infrastructure.

The Challenge: Connecting Two Product Worlds

At Mercari, we operate in a unique cross-border commerce landscape. Our Japanese B2C marketplace (Mercari Shops) serves many local merchants and customers, while our Global App connects international buyers with Japanese sellers. The challenge? Seamlessly synchronizing millions of products between these two distinct ecosystems in near real time to enrich experience for our customers.

The Business Context

C2C: Single Product for sale.
B2C: Product with multiple variants (e.g. size, color) having distinct stock quantities per variant. So customers can order multiple quantities of each variant.
Mercari Shops System: Japan-focused marketplace with local merchants.
Global Foundation: Cross-border platform serving global customers.
The Gap: Real-time product sync across different data models, currencies, and business rules.

Why This Integration Matters

Key motivations for this integration are as follows;

Enable Japanese merchants to reach global markets effortlessly
Provide consistent product experience across platforms
Maintain data integrity across distributed systems
Support millions of products with sub-second latency requirements

Data Sync – Challenges and Architecture

Here are some challenges and learnings we encountered while building this system, and how we refined our architecture iteratively:

Challenges

Event Deduplication & Ordering: Managing duplicate events and out-of-order message delivery in high-volume PubSub streams required implementing a robust Sync Tracker with message ID-based deduplication and timestamp validation to ensure data consistency.

Dual Sync Strategy Complexity: Coordinating both real-time event-driven sync and batch historical sync through the same ProductSync service while maintaining data integrity and avoiding conflicts between live updates and bulk operations.

Cross-System API Dependencies: Handling API calls to Mercari Shops systems for fetching latest product state introduced latency and failure scenarios that required careful retry logic, rate limiting, and graceful degradation strategies.

Asynchronous Search Indexing: Ensuring search index consistency without blocking the main sync flow by implementing event-driven indexing where ProductInventory publishes events after database storage, allowing SearchIndexer to update indices asynchronously.

Architecture

Our B2C product sync follows a dual-strategy approach, combining the best of real-time and batch processing patterns for old listings. Here’s the high level design for the current architecture.

Key Components

Real-Time Event Processing
- Pub/Sub events from Shop product updates
- Immediate sync for product changes
- Sub-second latency for critical updates
Batch Processing Pipeline
- Handles bulk product imports from BigQuery exports
- Processes millions of products efficiently
- Recovers from failed sync operations
Multi-Tier Service Architecture
- Tier 1 (Admin): Business logic and orchestration
- Tier 2 (Product): Core product management
- Tier 3 (Search): Handling search infrastructure

Development & Release Strategy

Our modular monolith architecture features a database designed to support diverse product types from multiple data sources. With active development across multiple internal modules by numerous contributors, we implemented isolation mechanisms to prevent cross-module interference and maintain shared component stability, so we decided to breakdown our work and scope into three parts,

Handling Events from multiple sources: For shop products we decided to create a separate module that will process all the events and transform them into Global Foundation specific data models. This module only consumes internal product inventory APIs for resources management.

Product Inventory: Created separate APIs for shop products that need special handling considering a product can have multiple variants (e.g. size, color) aspects but it’s developed in a way to reuse the existing internal APIs.

Search & Discovery: We unified the interface to support both C2C and B2C products, implementing the necessary architectural adjustments for compatibility.

Release Mechanism

We divided our data into two categories whose sync approach varies: "Live Data Sync" and "Historical Sync", here I will briefly describe the approaches we took to sync all the data.

Live Data Sync

We handle multiple events (e.g. create/update/delete product, update stock) for active listings with controlled RPS (via our internal PubSub gRPC pusher mechanism) and fetch critical data via APIs for each event to avoid any data stallness.

What is PubSub gRPC Pusher?

PubSub gRPC Pusher provides a subscription type for Google Cloud Pub/Sub that sends messages as a gRPC request. This is an in-house Mercari product not by GCP, designed to achieve high throughput, long running jobs, flexible delivery rates, etc.

To safely import all the shop products into our production environment, we decided to make the following approaches controlled by these configurations.

Steps:

Starts with small, only targeted 1 shop with a small amount of products to verify integrations (e.g. consistency, error handling).
Allow only search indexing but by default search results excluded shop products.
Verify integrations.
Include shop products in the search results via backend feature flags for limited internal users to avoid any negative impacts on our customers’ experience.
Verify end to end integrations.
Whitelisting more shops via configurations to speed up the live data sync.

Historical Sync

To sync old (e.g sold out, inactive) listings or any data anomaly happening after live data sync, we run batches targeting shops incrementally based on minimum products to max products per shop to manage the load in productions.

We use the following configurations for controlling batch processing. By utilizing this configuration we can control multiple aspects of the processing based on our system capacity at different times of the day.

// Sample config.

"admin-b2citemsync": {
job_config: {
        job_id: "JOB_XXXXX"
        start_offset:      "b2c-items/20250925-partition/partition-000-000000000000.json"
        end_offset:        "" // if omit then will target all the files in the partition.
        gcs_folder_path:   "b2c-items/20250925-partition/"
        resource_type:     "MK_JP_B2C_PRODUCTS"
        page_size:         300
        partial_data_size: (100 * 1024 * 1024) // 100MB
        concurrency_count: 500
        rate_limit:        1000
    }
}

Steps:

Run batches in the off-peak hour to avoid unnecessary load in the DB.
Implement phased rollout starting with small-catalog shops, then scale incrementally based on performance validation.
Use the appropriate configurations (e.g. RPS, file size) based on the capacity including dependent services.
Retry partial failed products.
Repeat.

Key Learnings

Summary

Through this comprehensive B2C data synchronization architecture, we successfully solved the critical challenge of reliably syncing millions of products across thousands of shops without compromising system performance or data integrity. By implementing dual synchronization pathways (real-time and batch) with centralized tracking, we achieved zero-downtime rollouts and maintained high-precision data synchronization across all integrated systems. Without this robust infrastructure, we would have faced frequent sync failures, data inconsistencies, and inability to scale beyond small pilot shops—ultimately preventing our cross-border expansion goals and risking significant revenue loss from search index outages.

Detailed Implementation Benefits

Event-Driven Architecture Benefits: Separating concerns through event-driven design (sync → store → publish → index) provided better scalability, fault tolerance, and allowed independent scaling of different system components.

Centralized Sync Control: The Sync Tracker became the heart of the system, providing comprehensive monitoring, deduplication, error handling, and audit trails that were essential for debugging and ensuring data reliability in production.

API-First Data Enrichment: Rather than relying solely on event payloads, fetching complete product data via API calls ensured data completeness and consistency, though it required careful handling of external system dependencies.

Clear System Boundaries: Explicitly defining Global Foundation vs Mercari Shops system boundaries with proper authentication, rate limiting, and error handling made the integration more maintainable and easier to troubleshoot in production environments.

Future Prospects

With this release, we’ve achieved full implementation of the core synchronization infrastructure and foundational data pipeline architecture. Moving forward, our technical roadmap focuses on implementing mission-critical features for cross-border transaction processing, such as product pre-order functionality and authentication features, while rapidly increasing the number of countries we expand to. We need not only horizontal expansion but also localization and growth in specific countries, entering a phase of further utilizing the infrastructure.

From Local to Global: Building Seamless B2C Product Integration at Mercari

The Challenge: Connecting Two Product Worlds

The Business Context

Why This Integration Matters

Data Sync – Challenges and Architecture

Challenges

Architecture

Key Components

Development & Release Strategy

Release Mechanism

Live Data Sync

What is PubSub gRPC Pusher?

Steps:

Historical Sync

Steps:

Key Learnings

Summary

Detailed Implementation Benefits

Future Prospects

Related article

Building Tooling for Global Customer Support Operations

The AI Lied to Me — And That’s When I Learned How to Use It

E2E Tests Every Developer Can Write — Test Platform Built with Plain go test