2021/12/13

Microservice Migration at Mercari: The Ideal and the Real

Author:: stanaka

, 2021/12/13

Microservice Migration at Mercari: The Ideal and the Real

Introduction

Hello, this is @stanaka . Mercari has launched the “Robust Foundation for Speed” project aimed at enhancing its business platform. The first part of this project is to migrate microservices, so in this entry I will explain our ideals for microservices migration and the realities of our progress.

The Ideal Modern Development Team

I’ll start with the ideal. Let’s look at some topics that came up over the past few years in improving an engineering organization and increasing its productivity. There are two points that are often emphasized for development teams and system architectures:

In order to accelerate development iteration, a development team should have responsibility for services/systems they own, from design to development and operation
In order to keep the cognitive load within an acceptable range, systems need to be loosely-coupled and designed to reduce dependence on other teams

A Development Team Should Have Responsibility from Design to Development and Operation

The book Accelerate demonstrates that there is a positive correlation between some development indicators, such as deployment frequency, and high performance of business KPIs. The reason for this is that customer feedback can be quickly fed back into a development team. In order for this to happen, it’s crucial for the team to maintain ownership throughout the entire life cycle of the product.

A good example is Netflix, where they give a large amount of authority to development teams working directly on their product in order to have faster improvement of the product. This has been their key to making highly accurate decisions and working at an accelerated speed. It’s also important to keep the cognitive load within an acceptable range for a given team and to reduce dependence on other teams, in order to maintain development speed.

Keeping the Cognitive Load Within an Acceptable Range

“Team Topologies” is an approach that shows the necessity of keeping the cognitive load within an acceptable range for a team and of reducing dependence on other teams, in order to maintain development speed. A Philosophy of Software Design argues that easily understandable high quality software can be designed by reducing complexity of code, architecture, and interfaces.

In order to make this happen, the areas of responsibility for each development team must be designed to minimize dependencies to other teams. Even if there are any dependencies, they should be defined in a clear and easily understandable manner.

At the same time, it’s vital for organizational designs (such as team composition) to be in harmony with system architecture designs, as stated by Conway’s Law and the Inverse Conway Maneuver. Finally, it’s also important to continuously update organizational structures to suit changing business environments, advances in technology, and the maturity of the teams.

Goals of Microservice Migration at Mercari

Now that we know what the ideal is, let’s look back on how microservice migration has progressed at Mercari until now.

Mercari began its microservice migration project in 2017 in order to implement the practices mentioned above. In addition to changing the system architecture, three goals were set during this project:

Put each team in charge of the entire process from design to operation, so that they can function independently
- Once migration has made some progress, reorganize into cross-functional teams that include client engineers responsible for the mobile and frontend areas
Have the Microservices Platform Team provide the functionality required for microservice development
Execute the data center migration project to bring technological innovation to monolithic environments (migration from on-premises to cloud)

We began our microservice migration efforts by migrating the listing feature—the feature that involves our users the most. Once we released the listing feature, we switched over to migrating the whole system. We split the entire monolith into three categories and continued working on the project. These categories are customer-facing features under active feature development, complicated business core logic (such as transactions), and long-tail features not under active feature development.

As we continued to work on the project and made good progress in migrating microservices, we were careful to do these two things to improve development team productivity:

Transfer technical decision-making authority to teams, and allow each team to determine technical designs (such as selecting a data store)
Once a certain amount of progress has been made with migration, begin reorganizing into cross-functional teams and provide these teams with the ability to have ownership over the entire service life cycle

In designing loosely-coupled systems to reduce the cognitive load on teams, we began by loosely assigned domains to each team, since we were not able to perform a comprehensive domain analysis of the monolith initially. We then had each team drill down. This is an area with a wide range of methodologies to use and discussions to have. Some might argue that it would be better to form teams and design architectures only after conducting a more thorough domain analysis.

Current Status in 2021

So, how are we doing in 2021? We’ve made especially good progress in migrating microservices for customer-facing features under active feature development. Nearly all teams are now using microservices for development. The Camp system was introduced during this process, and cross-functional teams are also being formed. We have reached the “Future” state I wrote about two years ago.

We also made a lot of progress in platform development, and our resource management, authority management, and documentation are all much better now compared with when the project first started. Some of these efforts are covered here, if you want to read more. As for infrastructure, we continue to make progress with data center migration and are now making steps toward our next milestone—set at the beginning of the project—which is bringing all systems to the cloud.

One major portion that still needs work is the complicated business core logic, which worried us from the very beginning. Although we’ve made improvements such as refactoring and adding tests, we haven’t reached a state yet that’s easy to work. Components in this area have strong dependencies and we haven’t yet performed sufficient domain analysis either. There has been no progress made in separating data stores by domain, causing the code and data to remain deeply tangled.

Toward Achieving a “Robust Foundation for Speed”

We’re approaching the end of 2021. As I discussed above, we’ve made progress in some areas in migrating microservices, but aren’t fully there yet. There’s still plenty of work to do. However, we’re now trying to avoid using the term “migration,” and are instead describing this as foundation development. This is the “Robust Foundation for Speed” project I mentioned at the beginning of this entry. This project covers not only updating business core logic including transactions (C2C transactions), but also our ID platform responsible for authentication and authorization, and our customer support tools. Mercari will continue to use this project to make technical investments in these areas in order to take on future business initiatives in a speedy manner.

We’ve recently posted two entries discussing the project background in more detail and what the project itself entails:

If you’re an engineer or engineer manager interested in solving complicated and difficult problems in this area, why not consider working with us? We’re looking forward to receiving your application.

Hiring information

Mercari is currently hiring. If you are at all interested in joining us, please take a look at our career page and the special page for Robust foundation for Speed.