I am Ahsun, working as a Software Engineer @Cross Border (XB) Engineering. In this article, titled "From Local to Global: Building Seamless B2C Product Integration at Mercari," I’d like to delve a bit deeper into how we architected a robust, scalable product synchronization system that handles both real-time updates and bulk data migrations between Mercari Shops System and Global Foundation. We’ll dive into the key challenges we faced, critical design decisions we made, and learnings that shaped our iterative approach to building a production-ready sync infrastructure.
The Challenge: Connecting Two Product Worlds
At Mercari, we operate in a unique cross-border commerce landscape. Our Japanese B2C marketplace (Mercari Shops) serves many local merchants and customers, while our Global App connects international buyers with Japanese sellers. The challenge? Seamlessly synchronizing millions of products between these two distinct ecosystems in near real time to enrich experience for our customers.
The Business Context
- C2C: Single Product for sale.
- B2C: Product with multiple variants (e.g. size, color) having distinct stock quantities per variant. So customers can order multiple quantities of each variant.
- Mercari Shops System: Japan-focused marketplace with local merchants.
- Global Foundation: Cross-border platform serving global customers.
- The Gap: Real-time product sync across different data models, currencies, and business rules.
Why This Integration Matters
Key motivations for this integration are as follows;
- Enable Japanese merchants to reach global markets effortlessly
- Provide consistent product experience across platforms
- Maintain data integrity across distributed systems
- Support millions of products with sub-second latency requirements
Data Sync – Challenges and Architecture
Here are some challenges and learnings we encountered while building this system, and how we refined our architecture iteratively:
Challenges
Event Deduplication & Ordering: Managing duplicate events and out-of-order message delivery in high-volume PubSub streams required implementing a robust Sync Tracker with message ID-based deduplication and timestamp validation to ensure data consistency.
Dual Sync Strategy Complexity: Coordinating both real-time event-driven sync and batch historical sync through the same ProductSync service while maintaining data integrity and avoiding conflicts between live updates and bulk operations.
Cross-System API Dependencies: Handling API calls to Mercari Shops systems for fetching latest product state introduced latency and failure scenarios that required careful retry logic, rate limiting, and graceful degradation strategies.
Asynchronous Search Indexing: Ensuring search index consistency without blocking the main sync flow by implementing event-driven indexing where ProductInventory publishes events after database storage, allowing SearchIndexer to update indices asynchronously.
Architecture
Our B2C product sync follows a dual-strategy approach, combining the best of real-time and batch processing patterns for old listings. Here’s the high level design for the current architecture.
Key Components
- Real-Time Event Processing
- Pub/Sub events from Shop product updates
- Immediate sync for product changes
- Sub-second latency for critical updates
- Batch Processing Pipeline
- Handles bulk product imports from BigQuery exports
- Processes millions of products efficiently
- Recovers from failed sync operations
- Multi-Tier Service Architecture
- Tier 1 (Admin): Business logic and orchestration
- Tier 2 (Product): Core product management
- Tier 3 (Search): Handling search infrastructure
Development & Release Strategy
Our modular monolith architecture features a database designed to support diverse product types from multiple data sources. With active development across multiple internal modules by numerous contributors, we implemented isolation mechanisms to prevent cross-module interference and maintain shared component stability, so we decided to breakdown our work and scope into three parts,
Handling Events from multiple sources: For shop products we decided to create a separate module that will process all the events and transform them into Global Foundation specific data models. This module only consumes internal product inventory APIs for resources management.
Product Inventory: Created separate APIs for shop products that need special handling considering a product can have multiple variants (e.g. size, color) aspects but it’s developed in a way to reuse the existing internal APIs.
Search & Discovery: We unified the interface to support both C2C and B2C products, implementing the necessary architectural adjustments for compatibility.
Release Mechanism
We divided our data into two categories whose sync approach varies: "Live Data Sync" and "Historical Sync", here I will briefly describe the approaches we took to sync all the data.
Live Data Sync
We handle multiple events (e.g. create/update/delete product, update stock) for active listings with controlled RPS (via our internal PubSub gRPC pusher mechanism) and fetch critical data via APIs for each event to avoid any data stallness.
What is PubSub gRPC Pusher?
PubSub gRPC Pusher provides a subscription type for Google Cloud Pub/Sub that sends messages as a gRPC request. This is an in-house Mercari product not by GCP, designed to achieve high throughput, long running jobs, flexible delivery rates, etc.
To safely import all the shop products into our production environment, we decided to make the following approaches controlled by these configurations.
Steps:
- Starts with small, only targeted 1 shop with a small amount of products to verify integrations (e.g. consistency, error handling).
- Allow only search indexing but by default search results excluded shop products.
- Verify integrations.
- Include shop products in the search results via backend feature flags for limited internal users to avoid any negative impacts on our customers’ experience.
- Verify end to end integrations.
- Whitelisting more shops via configurations to speed up the live data sync.
Historical Sync
To sync old (e.g sold out, inactive) listings or any data anomaly happening after live data sync, we run batches targeting shops incrementally based on minimum products to max products per shop to manage the load in productions.
We use the following configurations for controlling batch processing. By utilizing this configuration we can control multiple aspects of the processing based on our system capacity at different times of the day.
// Sample config.
"admin-b2citemsync": {
job_config: {
job_id: "JOB_XXXXX"
start_offset: "b2c-items/20250925-partition/partition-000-000000000000.json"
end_offset: "" // if omit then will target all the files in the partition.
gcs_folder_path: "b2c-items/20250925-partition/"
resource_type: "MK_JP_B2C_PRODUCTS"
page_size: 300
partial_data_size: (100 * 1024 * 1024) // 100MB
concurrency_count: 500
rate_limit: 1000
}
}
Steps:
- Run batches in the off-peak hour to avoid unnecessary load in the DB.
- Implement phased rollout starting with small-catalog shops, then scale incrementally based on performance validation.
- Use the appropriate configurations (e.g. RPS, file size) based on the capacity including dependent services.
- Retry partial failed products.
- Repeat.
Key Learnings
Summary
Through this comprehensive B2C data synchronization architecture, we successfully solved the critical challenge of reliably syncing millions of products across thousands of shops without compromising system performance or data integrity. By implementing dual synchronization pathways (real-time and batch) with centralized tracking, we achieved zero-downtime rollouts and maintained high-precision data synchronization across all integrated systems. Without this robust infrastructure, we would have faced frequent sync failures, data inconsistencies, and inability to scale beyond small pilot shops—ultimately preventing our cross-border expansion goals and risking significant revenue loss from search index outages.
Detailed Implementation Benefits
Event-Driven Architecture Benefits: Separating concerns through event-driven design (sync → store → publish → index) provided better scalability, fault tolerance, and allowed independent scaling of different system components.
Centralized Sync Control: The Sync Tracker became the heart of the system, providing comprehensive monitoring, deduplication, error handling, and audit trails that were essential for debugging and ensuring data reliability in production.
API-First Data Enrichment: Rather than relying solely on event payloads, fetching complete product data via API calls ensured data completeness and consistency, though it required careful handling of external system dependencies.
Clear System Boundaries: Explicitly defining Global Foundation vs Mercari Shops system boundaries with proper authentication, rate limiting, and error handling made the integration more maintainable and easier to troubleshoot in production environments.
Future Prospects
With this release, we’ve achieved full implementation of the core synchronization infrastructure and foundational data pipeline architecture. Moving forward, our technical roadmap focuses on implementing mission-critical features for cross-border transaction processing, such as product pre-order functionality and authentication features, while rapidly increasing the number of countries we expand to. We need not only horizontal expansion but also localization and growth in specific countries, entering a phase of further utilizing the infrastructure.