Hello! My name is @kentan, and I work as an Engineering Manager here at Mercari.
Our company’s Customer Reliability Engineering Team has developed what we call the Moderation System, which uses mechanisms like machine learning and rules to detect listings that violate Mercari’s rules. The Moderation System is systematically connected to the Mercari marketplace and has the traits of providing a web-based UI designed for use by Mercari Customer Support in their work cracking down on illegal activity. The system includes elements of frontend, backend, and machine learning technology and has features whose sophistication continues to grow.
Historically, at Mercari the backend, frontend, and machine learning engineering teams have worked separately according to their team and job function. This is why, when we started development on the Moderation System, we had to work together on development as a project team composed of several teams and comprising multiple job functions.
From the get-go, the existing four teams took part in the development of the Moderation System. We did things this way because the technical specialties of each team involved were indispensable to the Moderation System.
- Backend A Team
- Backend B Team
- Machine Learning A Team
- Machine Learning B Team
From the time we first broke ground on the project, we easily imagined the difficulties of getting four teams to communicate. This is why, in order to keep the members participating in the project on the same page, we created cross-team milestones, and we laid out the format for communication between teams. This worked very effectively, but as one would expect, we could not ignore the problems involved in developing a feature of this size relative to the large number of teams involved. In addition, due to things like some members leaving the company, one of the teams also faced a shortage of engineers and had to restructure their team.
What’s more, as other members were either transferred to different teams or resigned from Mercari, the product owner roles for three teams were consolidated in the hands of a single product manager (PdM). It became necessary to reduce their workload because this one PdM was taking part in the scrum events of three teams.
Team structure and ownership of microservices
To solve the problems we encountered, we considered restructuring the teams, and at the same time considered keeping the same team structure and re-examining who held ownership of microservices. This is because we expected that, even if we kept the existing structure of the four teams, we would be able to solve or at least reduce our problems by keeping the development of microservices for Moderation Systems to two or three teams.
Through our discussions, we concluded that we should first look at restructuring our teams rather than transferring ownership. There were two reasons for this:
- Each team only had a few members, and therefore they could not be expected to be in charge of developing any more microservices than they already were.
- Because the teams were divided by job function, they could not inherit a different team’s microservices (i.e., backend teams could not inherit the microservices of machine learning teams).
Therefore, we decided not to touch the ownership of microservices and instead looked at restructuring our teams.
Restructuring the teams
In restructuring our teams, we first got to work on the areas that we thought would benefit the most from being shuffled. As a result, we ended up merging the Backend A and Machine Learning A teams. Generally speaking, we chose to merge these two teams for the following reasons:
- Backend A Team was in charge of features used for such things as registering, changing, and applying rules related to detecting listings that violate Mercari’s rules. At the same time, Machine Learning A Team developed and operated machine learning models that identified items that our rules stipulate cannot be listed. Although the two fields of expertise differ from an engineering standpoint, they are extremely close as business domains, and we therefore had very big expectations for the synergy generated by merging the two teams together.
- Engineers from Machine Learning A Team had expressed their interest in expanding their areas of expertise into backend engineering. You could say that software development relies almost entirely on the ability and motivation of the talent involved, so we anticipated that our members’ motivation would be conducive to having them work effectively.
- Backend A Team and Machine Learning A Team shared the same PdM, and we anticipated that the effect of merging the two teams would also be tremendous from the perspective of the PdM’s job.
How we merged the teams
Phase 0: Checking the motivation of our members
We explained to each team member the purpose behind merging the teams. Then, when we asked them about their intentions, every last member said they could handle the challenges of taking on a new field. You could tell that they were driven. For the machine learning engineers, this would allow them to expand their expertise into the field of backend engineering, and backend engineers could also expand their expertise to include machine learning.
We merged the teams in four stages.
Phase 1: EM learns domain knowledge
As the Engineering Manager of Backend A Team, I was put in charge of the merged team as the EM. Personally, my experience was exclusively as a full-stack engineer working mainly on backend engineering, and I had absolutely no job experience working on machine learning. I didn’t have any domain knowledge regarding the development that Machine Learning A Team worked on either. For these reasons, acquiring this knowledge was a top priority.
- The EM understands the projects of Machine Learning A Team.
- The EM understands Machine Learning A Team’s development processes and scrum practices.
- The EM knows things about the Machine Learning A Team members like their personalities, skill sets, strengths and areas for improvement, motivations, and career plans.
- The EM learns the basics of machine learning so that they can communicate with their team members.
- There were no goals set regarding team members.
- Participated in scrum events such as sprint planning sessions and retrospectives
- Participated in the EM meetings for the division to which ML engineers belong
- Read past design docs and pull requests
- Looked through the past year of Slack posts to get a general sense of what has been discussed
- Read the onboarding documents of Machine Learning A Team (Because some materials were lacking content or out of date, we created and updated documents as needed.)
- Created pull requests for the projects that Machine Learning A Team developed
- Read the reports about experiments conducted when newly creating and improving ML models
- Worked to build trust relationships through 1-on-1 sessions with team members
- Also learned knowledge about machine learning through books, online courses, and other such sources
Phase 2: Members learn domain knowledge
Next, team members of each team learned domain knowledge.
- Backend engineers understand the projects of Machine Translation A Team.
- Backend engineers understand Machine Learning A Team’s development processes and scrum practices. They also learn Python, the programming language that the Machine Learning A Team uses.
- Machine learning engineers understand the projects of Backend A Team.
- Machine learning engineers understand Backend A Team’s development processes and scrum practices. They also learn Go, the programming language that the Backend A Team uses.
- Contributing to development was not one of the goals. (It was okay for the engineers to just focus on learning.)
- Members created an open channel on Slack and used it for Q&A and for studying machine learning and backend engineering.
- The two teams held a joint study group and studied domain knowledge.
- The Tech Lead of the machine learning team took part in the backend team’s scrum sprint and executed some development tasks.
Phase 3: Unify scrums
After each and every member had acquired domain knowledge, the groups unified their scrum processes. The teams had brought together their sprint refinement and planning, and so the merged team operated like a single development team.
- Bring together the separate scrum process of the two teams.
- Backend engineers perform development on ML projects.
- Machine learning engineers perform development on backend projects.
- Backend engineers create and improve machine learning models.
- As the Scrum Master, the EM led the scrum operations of the unified scrum team. At the same time, in order for other members to take over the role of Scrum Master, the EM also defined the scrum process.
- We held a study session in order for members to understand the background of JIRA development tickets. Because JIRA development tickets simply must be high context, it is hard for new employees who join the company to look at a ticket and understand all of its contents. To help with this, we gave the details of and the reasons why a certain development ticket was necessary, and we also worked to explain the business-related reasons behind development.
Phase 4: Unify on-call rotation systems
Last but not least, we adopted a single on-call system for the entire team. What this meant for our members is that they were on call half as often, which drastically reduced their workload. What’s more, allowances associated with on-call duties were cut in half. For the company, this led to improved cost efficiency on development.
- Backend Team members can be on call for Machine Learning Team projects.
- Machine Learning Team members can be on call for Backend Team projects.
- None (on account of Phase 4 being the final phase)
- With the Tech Lead providing coordination, we discussed what we needed in order for all members to feel at ease when on call. We then discussed what we needed to do for that to happen and compiled our ideas in a strategy document.
- Members held drills that role-played situations in which given incidents occurred.
- Members created on-call handbooks for each microservice.
We have started frontend development as the newly formed Feature Team. Customer support work involves a very complex group of features, and the UI is also within the purview of these duties. The percentage of frontend work that comprises Moderation System development work is substantial, and when we examined this work next to the Feature Team’s concept of executing all of the necessary development within their team, we saw a goal that we wanted them to achieve.
What’s more, for us to work on full-scale frontend development, we naturally must have frontend engineers with robust skills. To this end, we are hiring. Interested? If the answer is yes, be sure to contact me.
It’s of course no easy task to merge two teams into one when the members’ skills differ as much as the skills of backend engineers and machine learning engineers. I think one reason stood out above all other as to why it was possible for us to succeed at merging teams under these circumstances and without any issues: Our team members worked proactively to merge the teams. With regard to the unifying of on-call duties in Phase 4, all steps, from concrete planning to implementation, were centered on the Tech Lead and were completed with me having almost no involvement in the process as the EM.
Although generally classed as being out of the norm, I think that having an EM fill the role of scrum master was effective. To put it bluntly, when it comes to things where almost any format will work, like day-to-day standup meetings and the writing of tickets, having the EM make decisions about these things skips over a lot of fuss. For the time being, I make a decision, we put it to work, and then continue with scrum development.
In the process, for anything that is no longer a good fit, we discuss making improvements at the sprint retrospective as topics surface.
One issue that we are facing is that engineers with a background in backend engineering, myself included, have not yet learned all of the expert knowledge needed for machine learning, and therefore we cannot create machine learning models yet. Currently, we are in charge of developing systems and applications for machine learning projects, but for us to be able to create machine learning models will require us to look at applying a few more tricks.