2025/11/17

BenchMarking Databases For Global APP

Author:: amitbaral

, 2025/11/17

As Mercari continues its global expansion, the Global App must support an increasingly diverse set of workloads spanning high-throughput, low-latency transactions and strongly consistent multi-region data replication.
Selecting the right database engine is therefore critical to ensuring our platform remains scalable, reliable, and cost-efficient.
To this end, our team conducted a comprehensive benchmarking and evaluation exercise comparing multiple databases and later benchmarking between Google Cloud Spanner, AlloyDB for PostgreSQL, and CockroachDB.
This blog outlines the evaluation framework, performance results, cost comparison, and the resulting insights that inform our database direction.

Evaluation Criteria

To ensure a holistic assessment, we used 11 key evaluation dimensions, each weighted by importance to Global app architecture goals.

Scalability and Performance
This criteria focuses on the system’s capacity to handle growth and maintain efficiency under varying workloads.
It emphasizes the ability to scale horizontally by adding nodes to support increased load while ensuring sustained read and write throughput during peak demand. Equally important is maintaining low latency across globally distributed environments to deliver seamless user experiences regardless of location.
The evaluation also considers dynamic scaling capabilities, similar to Spanner Kit, which allow the system to automatically adjust resources based on region and time of day for optimal performance. Strong consistency is prioritized through support for strong reads on critical tables without dependence on stale replicas, ensuring accuracy and reliability of data in real-time operations.
Consistency Model
The Consistency Model focuses on the balance between strong and eventual consistency to meet application-specific requirements.
It assesses ACID compliance for distributed transactions, ensuring data integrity and reliability across nodes and regions. Cross-boundary consistency support is essential to maintain synchronization of critical datasets, such as inventory and pricing information, across systems and geographies.
This ensures users experience consistent and reliable data views, even in distributed or high-traffic environments.
Multi-Region Support
This dimension evaluates the system’s ability to operate efficiently and reliably across multiple geographic regions. It includes native multi-region replication and data distribution to enhance both performance and availability.
Data locality controls are necessary to optimize latency and ensure compliance with regional data governance requirements. The service should also enable rapid addition of new regions to support global expansion, ensuring scalability and flexibility as the business grows into new markets.
Reliability and Availability
This criteria examines the system’s robustness and its capacity to maintain uptime under various failure conditions. It prioritizes strong SLA guarantees for uptime and recovery, along with fault-tolerant architectures capable of handling node or network disruptions gracefully.
Effective disaster recovery mechanisms, including automatic failover and multi-region redundancy, are vital to ensure continuity of operations and minimal data loss in the event of major outages or infrastructure failures.
Compliance and Security
This criteria ensures that the system adheres to global data protection and privacy standards while safeguarding sensitive information.
The service should comply with international regulatory frameworks such as GDPR, HIPAA, and CCPA. Additionally, granular access controls and role-based permissions are necessary to manage data visibility and maintain strict security boundaries across teams and environments.
Operational Complexity
This criteria evaluates how easily the service can be deployed, scaled, monitored, and maintained. Simplicity in management operations is key to reducing overhead and improving reliability.
Native automation capabilities for tasks such as backups, patching, and scaling are highly valuable, ensuring operational efficiency. The service should also support flexible maintenance windows and minimize downtime during version upgrades or infrastructure changes, promoting smoother long-term operation.
Cost
This criteria assesses the overall financial efficiency of the service, encompassing compute, storage, and data transfer costs.
It considers not only pricing flexibility and predictability but also the Total Cost of Ownership (TCO), which includes operational and maintenance overhead.
The goal is to identify a solution that delivers strong performance and scalability while maintaining cost-effectiveness and transparency in pricing structures, enabling better budgeting and resource planning.
Vendor Lock-In
This criteria focuses on the system’s level of dependence on proprietary technologies and its portability across cloud environments.
Preference is given to platforms that adopt open standards and APIs, reducing barriers to migration and integration. The service should enable easy database switching and align with a modular monolith architecture to ensure long-term flexibility.
Integration and Ecosystem
This dimension measures how well the system integrates within the existing Google Cloud Platform ecosystem and with third-party tools. Compatibility with the current technology stack, extensions, and monitoring tools ensures smooth adoption and interoperability.
Vendor Support and SLA
This criteria evaluates the quality and reliability of the provider’s support structure. This includes responsiveness, depth of technical expertise, and clarity of communication.
Comprehensive documentation, robust service-level agreements, and active community engagement are crucial to ensuring quick issue resolution and continuous operational confidence.
Developer Knowledge and Expertise
This criteria considers the existing skill sets within the development team and the ease of adopting new technologies.
Familiarity with SQL and PostgreSQL dialects ensures a shorter learning curve and more efficient implementation. The availability of mature development tooling, monitoring libraries, and educational resources further empowers teams to build, optimize, and troubleshoot effectively.

Weighted Evaluation Matrix

Based on the above evaluation, we selected Alloydb, spanner and Cockroachdb as possible alternatives and executed performance benchmarking on them.

Criteria	Weight
Scalability & Performance	20%
Cost	15%
Reliability & Availability	15%
Multi-Region Support	10%
Compliance & Security	10%
Consistency Model	7.5%
Operational Complexity	5%
Vendor Lock-In	5%
Integration & Ecosystem	5%
Vendor Support & SLA	5%
Developer Knowledge & Expertise	2.5%

Based on the above evaluation, we selected Alloydb, spanner and Cockroachdb as possible alternatives and executed performance benchmarking on them

Performance Comparison of AlloyDB, Spanner, and CockroachDB

We benchmarked AlloyDB, Spanner, and CockroachDB using the Yahoo! Cloud Serving Benchmark (YCSB).
The test focused on throughput and latency across multiple workload profiles representative of our application’s expected data access patterns.
Thread counts were adjusted for each database until CPU utilization reached approximately 65%, ensuring an equitable comparison.

Tooling and Configuration

Tool: YCSB (Go implementation)
https://github.com/pingcap/go-ycsb/tree/master
Region: Tokyo
Initial dataset: 200M rows
Operations per execution: 10M
Warmup time: 1 hour
Execution duration: 30 minutes post warm-up

Workload Patterns

Workload	Read/Write Ratio	Description
A	80/20	Mixed transactional workload
B	95/5	Read-heavy
C	99/1	Read-dominant
D	50/50	Write-heavy

Benchmark Results

Workload	Database	Operation	P50 Latency (ms)	P99 Latency (ms)	Throughput (OPS)
A (80/20)	AlloyDB	Read	1.35	5.2	82,783.9
	AlloyDB	Write	2.7	6.7	20,860.0
	Spanner	Read	3.15	6.18	13,092.58
	Spanner	Write	6.79	13.29	3,287.02
	CockroachDB	Read	1.1	13.2	14,856.8
	CockroachDB	Write	4.9	21.2	3,722.7
B (95/5)	AlloyDB	Read	1.28	6.7	117,916.1
	AlloyDB	Write	2.5	19.7	6,097.4
	Spanner	Read	4.44	6.18	17,576.38
	Spanner	Write	8.8	14.0	927.68
	CockroachDB	Read	1.3	14.8	11,606.6
	CockroachDB	Write	3.9	18.5	612.0
C (99/1)	AlloyDB	Read	1.38	7.2	135,215.0
	AlloyDB	Write	2.07	5.95	1,440.0
	Spanner	Read	4.1	6.01	20,399.03
	Spanner	Write	8.6	13.5	205.5
	CockroachDB	Read	1.3	14.77	12,090.3
	CockroachDB	Write	3.2	18.3	636.2
D (50/50)	AlloyDB	Read	1.47	7.3	49,703.2
	AlloyDB	Write	4.35	14.1	46,104.6
	Spanner	Read	3.05	5.38	6,465.4
	Spanner	Write	7.96	13.5	6,474.32
	CockroachDB	Read	1.3	13.77	6,854.6
	CockroachDB	Write	7.2	23.3	6,844.6

Cost Comparison

Feature / Tier	Spanner Standard	Spanner Enterprise	Spanner Enterprise Plus	AlloyDB Standard	AlloyDB HA	CockroachDB
Instance Cost	$854	$1,167	$1,622	$290	$580	$610
Storage Cost	$0.39/GB	$0.39/GB	$0.39/GB	$0.38/GB	$0.38/GB	$0.30/GB
Backup Cost	$0.10/GB	$0.10/GB	$0.10/GB	$0.12/GB	$0.12/GB	$0.10/GB

Reference: Google Cloud Spanner pricing, AlloyDB for PostgreSQL pricing, and CockroachDB Cloud pricing

Analysis and Conclusion

Our evaluation compared AlloyDB, Spanner, and CockroachDB across key performance dimensions, focusing on latency, throughput, and operational trade-offs.

AlloyDB consistently delivered the lowest P50 and P99 latencies across all workloads, indicating superior responsiveness and overall performance. Spanner maintained strong consistency and stable latency, though its write latency was comparatively higher. CockroachDB offered fast reads with low P50 latency but showed higher P99 variance, signaling occasional spikes under heavy load. In terms of throughput, AlloyDB achieved the highest performance for both read and write operations across all test scenarios. Spanner demonstrated excellent reliability but lower throughput under write-intensive workloads. CockroachDB performed competitively for read-heavy workloads but struggled to sustain high write throughput over extended durations.

AlloyDB provides the best overall balance between throughput, cost efficiency, and operational simplicity making it particularly suitable for read-intensive and mixed workloads. Spanner remains the benchmark for global consistency and reliability, though it involves higher latency and cost trade-offs. CockroachDB, as an open-source alternative, offers flexibility and adaptability but introduces greater management complexity, performance variability, and relatively higher operational costs.

There is no single “perfect” database solution; each option presents trade-offs in performance, consistency, scalability, and cost. After a comprehensive evaluation, AlloyDB has been chosen as our primary database due to its strong balance of high performance, PostgreSQL compatibility, and operational simplicity. Spanner will continue to serve mission-critical services requiring global strong consistency and horizontal scalability. CockroachDB remains under consideration for future exploration, particularly for self-managed or hybrid deployments, given its promising trajectory in distributed SQL systems.

Decision Matrix (Reference)

Criteria	Weight	AlloyDB	Spanner	CockroachDB
Scalability & Performance	20%	✅ High	✅ Medium	✅ Medium
Cost	15%	💰 Excellent	💸 Expensive	💰 Moderate
Reliability & Availability	15%	🟢 High (HA)	🟢 Excellent	🟢 High
Multi-Region Support	10%	🟡 Partial	🟢 Native	🟢 High
Compliance & Security	10%	🟢 High	🟢 High	🟢 High
Consistency Model	7.5%	🟢 Strong	🟢 Strong	⚙️ Tunable
Operational Complexity	5%	🟢 Simple	🟢 Managed	🟢 Managed
Vendor Lock-In	5%	🟡 Medium	🔴 High	🟢 Low
Integration & Ecosystem	5%	🟢 GCP Native	🟢 GCP Native	🟢 Broad OSS
Vendor Support & SLA	5%	🟢 Strong	🟢 Strong	🟡 Variable
Developer Knowledge & Expertise	2.5%	🟢 PostgreSQL	🟡 Custom APIs	🟢 SQL Compatible

Acknowledgments

Special thanks to the Database Reliability Group and Google technical support for their contributions, validation, and support throughout this benchmarking exercise.