Introduction
Hello, I am Yuji Kazama, the Engineering Manager of the FinOps team at Mercari. Since the inception of Mercari Group, we have heavily relied on the public cloud to deliver diverse services to our customers. This article sheds light on the FinOps initiatives being carried out at Mercari Group to enhance the value derived from cloud services.
What is FinOps
Rising cloud costs have put the spotlight on FinOps, a concept defined by the FinOps Foundation as “an operational framework and cultural practice which maximizes the business value of the cloud, enables timely data-driven decision making, and creates financial accountability through collaboration between engineering, finance, and business teams”.
Let’s delve into the nature of cloud costs. Prior to cloud technology, predicting demand, procuring servers, and establishing data centers in-house were necessary steps, posing a challenge when demand forecasts shifted and shifting flexibility in business needed a response.
The advent of cloud technology, while crucial for launching new businesses, introduced a cost consumption model distinct from traditional data centers, characterized by being “decentralized”, “valuable”, and “scalable”.
The term “decentralized” indicates that engineers, detached from financial and procurement divisions, practically govern cloud usage. The term “variable” suggests significant fluctuations in cloud costs, unlike fixed data center costs. “Scalable” denotes quick utilization of the cloud, which could lead to resource over-allocation.
FinOps at Mercari
Embracing a microservice architecture, Mercari Group uses Google Cloud Platform (GCP) as its primary cloud provider, operates over 200 microservices, and mandates upwards of 4000 Kubernetes Pods. We also store data on a petabyte scale, which is used for refining our products for users and propelling business growth.
In July 2022, Mercari embarked on FinOps activities due to the consistent rise in cloud costs outpacing business growth, underscoring the need for cloud cost optimization.
There were three main challenges that needed to be tackled. The first challenge was unpredictable cost increases. Monthly GCP invoices often contained unforeseen charges, necessitating urgent investigations. The second was the opaque cost structure – the difficulty in understanding the cost allocation across various projects and services within the company. The last one was that fostering cost optimization cooperation across Mercari Group was hindered by organizational barriers.
Cost Visibility
Understanding costs is paramount. We started developing cost dashboards showcasing the service and cost distribution among the companies and business units. This has made it possible to understand cloud costs in near real-time, where they used to be aggregated on a monthly basis. If a sudden increase in costs is detected, we confirm the situation with the relevant engineers to check whether the cost increase was intentional.
Goal Setting
Next, we focused on setting goals. We established KPIs for each business perspective and major GCP cost drivers, set OKRs related to FinOps every quarter, and are working on cost optimization measures.
From a business perspective, we are tracking the difference between budget and actuals. Additionally, at Mercari, we track the infrastructure cost per transaction incurred by customers for each transaction in our marketplace. The primary cost drivers for GCP mainly involve computing resources, as well as data warehouse and storage resources. While tracking KPIs such as resource utilization rates, the application rate of CUDs and Spot VMs, and the application rate of data retention policies, we are implementing numerous cost optimization measures.
Category | KPI | Examples of optimization |
---|---|---|
Business | Budget vs Actual, Cost per Transaction | (N/A) |
Compute Resources | Resource utilization ratio, CUD adopotion ratio, Spot VM adoption Ratio | Terminate unused resources, Rightsizing, Improve auto scaling, Develop resource recomencation tools |
DW/Storage Resources | Data reduction ratio, Lifecycle policy ratio | Delete unused resources, Apply retention/archive policy, Develop resource recomendation tools |
For those interested in the details of the optimization measures, see also the following articles.
- Tortoise: Outpacing the Optimization Challenges in Kubernetes at Mercari
- How We Saved 75% of our Server Costs
- x86 is dead, long live x86!
- Implementing Elasticsearch CPU usage based auto scaling
Regular Reporting
We establish a routine report system. We have made it a practice to report on the cloud cost regularly to engineers, finance teams, and executives.
For engineers, during the monthly All Hands meeting, we report on the cost and also commend the cost optimization activities carried out by engineers. Furthermore, if a sudden increase in costs is detected, we confirm the situation with the relevant engineers to check whether the cost increase was intentional.
For the finance team, while providing information on the cloud cost situation, we support discussions on cloud budget formulation according to each company’s business strategy. Additionally, by proposing metrics such as cost per transaction and other Unit Economics indicators that we introduced earlier, we have been able to facilitate constructive discussions about the relationship between business growth and cloud costs.
For executives, we report on the progress of OKRs weekly and provide a monthly report on the overall cost of the group companies.
Promoting Cost Consciousness
At Mercari, we regularly host internal hackathons where engineers can work on experimental feature development and performance improvements that are difficult to do in their usual development activities. By establishing a special “FinOps Award” during these internal hackathons, we are also creating a culture that encourages cost awareness among engineers.
We started FinOps as a cross-organizational project activity. It was challenging to maintain the momentum needed to continue involving the entire group of companies. To continuously implement FinOps, we established a dedicated FinOps organization. The challenge that the FinOps team wants to address is achieving a “culture shift” across the entire Mercari Group. Those who use the cloud must take responsibility for their cloud usage and cost. Moreover, teams that use the cloud should be concerned about the ROI (Return on Investment) of the features provided by the Mercari app and system investments. The FinOps team plays a role in facilitating stakeholders from each group to make such a culture shift possible.
Results
The FinOps approach has yielded noteworthy benefits, achieving over 30% in cost optimization and enhancing group-wide communication on cloud costs, thereby expediting decision-making and allowing us to proactively manage cost surges. We saw a significant cultural shift, with “FinOps” becoming part of the daily lexicon among engineers, reflecting a heightened awareness of cloud cost management.
Additionally, Mercari held the first Japan FinOps Meetup at the office in order to provide an environment for learning about FinOps by building a network of people in Japan who are interested in FinOps.
Conclusion
This article outlines the FinOps endeavors by Mercari Group to maximize cloud service value. Mercari is looking for engineers. The cultural shift described above has not been fully actualized yet. As we continue to evolve our platform and foster this cultural shift, we welcome engineers passionate about contributing to such initiatives to join us at Mercari.