As Mercari continues its global expansion, the Global App must support an increasingly diverse set of workloads spanning high-throughput, low-latency transactions and strongly consistent multi-region data replication.
Selecting the right database engine is therefore critical to ensuring our platform remains scalable, reliable, and cost-efficient.
To this end, our team conducted a comprehensive benchmarking and evaluation exercise comparing multiple databases and later benchmarking between Google Cloud Spanner, AlloyDB for PostgreSQL, and CockroachDB.
This blog outlines the evaluation framework, performance results, cost comparison, and the resulting insights that inform our database direction.
Evaluation Criteria
To ensure a holistic assessment, we used 11 key evaluation dimensions, each weighted by importance to Global app architecture goals.
-
Scalability and Performance
This criteria focuses on the system’s capacity to handle growth and maintain efficiency under varying workloads.
It emphasizes the ability to scale horizontally by adding nodes to support increased load while ensuring sustained read and write throughput during peak demand. Equally important is maintaining low latency across globally distributed environments to deliver seamless user experiences regardless of location.
The evaluation also considers dynamic scaling capabilities, similar to Spanner Kit, which allow the system to automatically adjust resources based on region and time of day for optimal performance. Strong consistency is prioritized through support for strong reads on critical tables without dependence on stale replicas, ensuring accuracy and reliability of data in real-time operations. -
Consistency Model
The Consistency Model focuses on the balance between strong and eventual consistency to meet application-specific requirements.
It assesses ACID compliance for distributed transactions, ensuring data integrity and reliability across nodes and regions. Cross-boundary consistency support is essential to maintain synchronization of critical datasets, such as inventory and pricing information, across systems and geographies.
This ensures users experience consistent and reliable data views, even in distributed or high-traffic environments. -
Multi-Region Support
This dimension evaluates the system’s ability to operate efficiently and reliably across multiple geographic regions. It includes native multi-region replication and data distribution to enhance both performance and availability.
Data locality controls are necessary to optimize latency and ensure compliance with regional data governance requirements. The service should also enable rapid addition of new regions to support global expansion, ensuring scalability and flexibility as the business grows into new markets. -
Reliability and Availability
This criteria examines the system’s robustness and its capacity to maintain uptime under various failure conditions. It prioritizes strong SLA guarantees for uptime and recovery, along with fault-tolerant architectures capable of handling node or network disruptions gracefully.
Effective disaster recovery mechanisms, including automatic failover and multi-region redundancy, are vital to ensure continuity of operations and minimal data loss in the event of major outages or infrastructure failures. -
Compliance and Security
This criteria ensures that the system adheres to global data protection and privacy standards while safeguarding sensitive information.
The service should comply with international regulatory frameworks such as GDPR, HIPAA, and CCPA. Additionally, granular access controls and role-based permissions are necessary to manage data visibility and maintain strict security boundaries across teams and environments. -
Operational Complexity
This criteria evaluates how easily the service can be deployed, scaled, monitored, and maintained. Simplicity in management operations is key to reducing overhead and improving reliability.
Native automation capabilities for tasks such as backups, patching, and scaling are highly valuable, ensuring operational efficiency. The service should also support flexible maintenance windows and minimize downtime during version upgrades or infrastructure changes, promoting smoother long-term operation. -
Cost
This criteria assesses the overall financial efficiency of the service, encompassing compute, storage, and data transfer costs.
It considers not only pricing flexibility and predictability but also the Total Cost of Ownership (TCO), which includes operational and maintenance overhead.
The goal is to identify a solution that delivers strong performance and scalability while maintaining cost-effectiveness and transparency in pricing structures, enabling better budgeting and resource planning. -
Vendor Lock-In
This criteria focuses on the system’s level of dependence on proprietary technologies and its portability across cloud environments.
Preference is given to platforms that adopt open standards and APIs, reducing barriers to migration and integration. The service should enable easy database switching and align with a modular monolith architecture to ensure long-term flexibility. -
Integration and Ecosystem
This dimension measures how well the system integrates within the existing Google Cloud Platform ecosystem and with third-party tools. Compatibility with the current technology stack, extensions, and monitoring tools ensures smooth adoption and interoperability. -
Vendor Support and SLA
This criteria evaluates the quality and reliability of the provider’s support structure. This includes responsiveness, depth of technical expertise, and clarity of communication.
Comprehensive documentation, robust service-level agreements, and active community engagement are crucial to ensuring quick issue resolution and continuous operational confidence. -
Developer Knowledge and Expertise
This criteria considers the existing skill sets within the development team and the ease of adopting new technologies.
Familiarity with SQL and PostgreSQL dialects ensures a shorter learning curve and more efficient implementation. The availability of mature development tooling, monitoring libraries, and educational resources further empowers teams to build, optimize, and troubleshoot effectively.Weighted Evaluation Matrix
Based on the above evaluation, we selected Alloydb, spanner and Cockroachdb as possible alternatives and executed performance benchmarking on them.
| Criteria | Weight |
|---|---|
| Scalability & Performance | 20% |
| Cost | 15% |
| Reliability & Availability | 15% |
| Multi-Region Support | 10% |
| Compliance & Security | 10% |
| Consistency Model | 7.5% |
| Operational Complexity | 5% |
| Vendor Lock-In | 5% |
| Integration & Ecosystem | 5% |
| Vendor Support & SLA | 5% |
| Developer Knowledge & Expertise | 2.5% |
Based on the above evaluation, we selected Alloydb, spanner and Cockroachdb as possible alternatives and executed performance benchmarking on them
Performance Comparison of AlloyDB, Spanner, and CockroachDB
We benchmarked AlloyDB, Spanner, and CockroachDB using the Yahoo! Cloud Serving Benchmark (YCSB).
The test focused on throughput and latency across multiple workload profiles representative of our application’s expected data access patterns.
Thread counts were adjusted for each database until CPU utilization reached approximately 65%, ensuring an equitable comparison.
Tooling and Configuration
- Tool: YCSB (Go implementation)
https://github.com/pingcap/go-ycsb/tree/master - Region: Tokyo
- Initial dataset: 200M rows
- Operations per execution: 10M
- Warmup time: 1 hour
- Execution duration: 30 minutes post warm-up
Workload Patterns
| Workload | Read/Write Ratio | Description |
|---|---|---|
| A | 80/20 | Mixed transactional workload |
| B | 95/5 | Read-heavy |
| C | 99/1 | Read-dominant |
| D | 50/50 | Write-heavy |
Benchmark Results
| Workload | Database | Operation | P50 Latency (ms) | P99 Latency (ms) | Throughput (OPS) |
|---|---|---|---|---|---|
| A (80/20) | AlloyDB | Read | 1.35 | 5.2 | 82,783.9 |
| Write | 2.7 | 6.7 | 20,860.0 | ||
| Spanner | Read | 3.15 | 6.18 | 13,092.58 | |
| Write | 6.79 | 13.29 | 3,287.02 | ||
| CockroachDB | Read | 1.1 | 13.2 | 14,856.8 | |
| Write | 4.9 | 21.2 | 3,722.7 | ||
| B (95/5) | AlloyDB | Read | 1.28 | 6.7 | 117,916.1 |
| Write | 2.5 | 19.7 | 6,097.4 | ||
| Spanner | Read | 4.44 | 6.18 | 17,576.38 | |
| Write | 8.8 | 14.0 | 927.68 | ||
| CockroachDB | Read | 1.3 | 14.8 | 11,606.6 | |
| Write | 3.9 | 18.5 | 612.0 | ||
| C (99/1) | AlloyDB | Read | 1.38 | 7.2 | 135,215.0 |
| Write | 2.07 | 5.95 | 1,440.0 | ||
| Spanner | Read | 4.1 | 6.01 | 20,399.03 | |
| Write | 8.6 | 13.5 | 205.5 | ||
| CockroachDB | Read | 1.3 | 14.77 | 12,090.3 | |
| Write | 3.2 | 18.3 | 636.2 | ||
| D (50/50) | AlloyDB | Read | 1.47 | 7.3 | 49,703.2 |
| Write | 4.35 | 14.1 | 46,104.6 | ||
| Spanner | Read | 3.05 | 5.38 | 6,465.4 | |
| Write | 7.96 | 13.5 | 6,474.32 | ||
| CockroachDB | Read | 1.3 | 13.77 | 6,854.6 | |
| Write | 7.2 | 23.3 | 6,844.6 |
Cost Comparison
| Feature / Tier | Spanner Standard | Spanner Enterprise | Spanner Enterprise Plus | AlloyDB Standard | AlloyDB HA | CockroachDB |
|---|---|---|---|---|---|---|
| Instance Cost | $854 | $1,167 | $1,622 | $290 | $580 | $610 |
| Storage Cost | $0.39/GB | $0.39/GB | $0.39/GB | $0.38/GB | $0.38/GB | $0.30/GB |
| Backup Cost | $0.10/GB | $0.10/GB | $0.10/GB | $0.12/GB | $0.12/GB | $0.10/GB |
Reference: Google Cloud Spanner pricing, AlloyDB for PostgreSQL pricing, and CockroachDB Cloud pricing
Analysis and Conclusion
Our evaluation compared AlloyDB, Spanner, and CockroachDB across key performance dimensions, focusing on latency, throughput, and operational trade-offs.
AlloyDB consistently delivered the lowest P50 and P99 latencies across all workloads, indicating superior responsiveness and overall performance. Spanner maintained strong consistency and stable latency, though its write latency was comparatively higher. CockroachDB offered fast reads with low P50 latency but showed higher P99 variance, signaling occasional spikes under heavy load. In terms of throughput, AlloyDB achieved the highest performance for both read and write operations across all test scenarios. Spanner demonstrated excellent reliability but lower throughput under write-intensive workloads. CockroachDB performed competitively for read-heavy workloads but struggled to sustain high write throughput over extended durations.
AlloyDB provides the best overall balance between throughput, cost efficiency, and operational simplicity making it particularly suitable for read-intensive and mixed workloads. Spanner remains the benchmark for global consistency and reliability, though it involves higher latency and cost trade-offs. CockroachDB, as an open-source alternative, offers flexibility and adaptability but introduces greater management complexity, performance variability, and relatively higher operational costs.
There is no single “perfect” database solution; each option presents trade-offs in performance, consistency, scalability, and cost. After a comprehensive evaluation, AlloyDB has been chosen as our primary database due to its strong balance of high performance, PostgreSQL compatibility, and operational simplicity. Spanner will continue to serve mission-critical services requiring global strong consistency and horizontal scalability. CockroachDB remains under consideration for future exploration, particularly for self-managed or hybrid deployments, given its promising trajectory in distributed SQL systems.
Decision Matrix (Reference)
| Criteria | Weight | AlloyDB | Spanner | CockroachDB |
|---|---|---|---|---|
| Scalability & Performance | 20% | ✅ High | ✅ Medium | ✅ Medium |
| Cost | 15% | 💰 Excellent | 💸 Expensive | 💰 Moderate |
| Reliability & Availability | 15% | 🟢 High (HA) | 🟢 Excellent | 🟢 High |
| Multi-Region Support | 10% | 🟡 Partial | 🟢 Native | 🟢 High |
| Compliance & Security | 10% | 🟢 High | 🟢 High | 🟢 High |
| Consistency Model | 7.5% | 🟢 Strong | 🟢 Strong | ⚙️ Tunable |
| Operational Complexity | 5% | 🟢 Simple | 🟢 Managed | 🟢 Managed |
| Vendor Lock-In | 5% | 🟡 Medium | 🔴 High | 🟢 Low |
| Integration & Ecosystem | 5% | 🟢 GCP Native | 🟢 GCP Native | 🟢 Broad OSS |
| Vendor Support & SLA | 5% | 🟢 Strong | 🟢 Strong | 🟡 Variable |
| Developer Knowledge & Expertise | 2.5% | 🟢 PostgreSQL | 🟡 Custom APIs | 🟢 SQL Compatible |
Acknowledgments
Special thanks to the Database Reliability Group and Google technical support for their contributions, validation, and support throughout this benchmarking exercise.

