2022/01/17

Blog Series of Introduction of Developer Productivity Engineering at Mercari

Author:: deeeet

, 2022/01/17

Blog Series of Introduction of Developer Productivity Engineering at Mercari

Author: @deeeeeet, Engineering head of Developer Productivity Engineering

Developer Productivity Engineering Camp (“Camp” is a unit and a term we use internally at Mercari to logically group the related teams) is a division which is mainly responsible for the entire Mercari group’s base infrastructure and DevOps toolings and services. It consists of multiple teams and works on multiple initiatives to keep the Mercari group’s product reliable, secure, and scalable.

We’re working on many interesting projects but we didn’t have much chance to show them outside recently. So we decided to write a series of blog articles by all teams in the Camp and output what we are doing. In this blog series, we will introduce the teams in the Camp and recent projects teams are working on. I hope some articles catch your eyes and are beneficial to your work.

Before going to the series, in this blog post, I would like to introduce the Developer Productivity Engineering Camp itself: its responsibilities and mission, long-term directions. In addition to that, and, at the end of the post, there is a idex of all posts. You should be able to use it to find articles you want to read.

Developer Productivity Engineering Camp

Developer Productivity Engineering (DPE) Camp consists of the following 8 teams:

Platform Developer Experience (DX) team: Working on improving the developers experience by providing better abstraction and automated workflows
Platform Infra team: Working on the base infrastructure operations as the cloud (GCP & AWS) and Kubernetes admin, as well as building the observability platform
CI/CD team: Providing testing infrastructure, toolings, and the delivery system to make service delivery faster and more reliable
Network team: Responsible for end-to-end network infrastructure from the edge (CDN) to the cloud & service mesh (Istio) and physical data centers
Microservices SRE team: Providing embedded SRE support to the product team to improve service reliability. And spreading SRE practices to the entire organization so that they can work reliability work without SREs
Search Infra team: Providing search as a service to Mercari group
Web Platform team: Providing web microservice platform as a service for web products in mercari group
Core SRE team: Managing core large scale database for the monolith API and related infrastructures

(See more details on How We Reorganize Microservices Platform Team)

These teams have their own mission and responsibilities (the details of each team’s mission and responsibilities will be described in the upcoming blog posts) and work on their goals but, at the high level, as a Camp, we have the following 2 main responsibilities and missions:

Provide the reliable, secure, and scalable infrastructure
Provide the best developer experience to deliver products faster and easier

Each team has different interaction modes (see Team Topologies) (e.g., while platform DX is working with X-as-a-Service, Microservices SRE works by Collaboration), but all teams share the same main customer: the product team. By supporting the product team to develop great features, we indirectly deliver our value to the Mercari customers. We are not only providing the base infrastructure and tooling but also ensuring the developer experience and accessibility of them which is potentially very complex for the product team. As said in multiple teams’ missions, we are working not only for Mercari but also for other companies in the Mercari group like Merpay as well.

Same as mission, each team has its own roadmap and projects but, at the high level, we have the following 4 main directions to working on:

Harden platform security
Improve developer activity visibility
Developer Experience Enhancement
Scale and modernize infrastructure

Last year, we had a large security incident caused by Codecov vulnerability. From its retrospective, we found out many fundamental improvements on our platform. Even before that, security was one of the most important priorities for us but, after that, we kicked many initiatives as the highest priority and started working on many improvements: moving forward to zero-touch production and keyless authentication everywhere, rebuilding CI/CD systems from scratch, or strengthening cloud governance.

For platform engineering, its product decision i.e., deciding what problem to be solved next, is one of the most important things. To do it better, collecting developer activity metrics e.g., Deploy per Developer per Day (See Accelerate), is very important (It’s also true for the upper-level leadership to do good decision-making of the engineering direction or the product team understand its potential improvements area). We’ve been working on this for a while but now we’re enhancing it and trying to get better visibility with the ability to drill down the details.

I will explain details in the Platform DX team’s introduction post later but the importance of enhancing developer experience is increasing. By reducing the frictions to the toolings or hiding the complexity of underlining complex infrastructure, product teams can focus on the product development itself and deliver value faster to our customers. We provide the fundamental components and tooling to develop the service but they expose too many complex low-level details or some are not well integrated with each other (workflow issue). To solve these issues, we are working on introducing more abstractions and a unified workflow process and experience across the toolings.

We’ve been working on Cloud migration and infrastructure modernization for a long time. As described in this blog post, most of the core systems were migrated to Tokyo from Hokkaido (previous DC we used) to integrate well with systems on Google Cloud in Tokyo. But still, many small systems are remaining and not all systems are in the Cloud. We are continuously working on cloud-native migration. Not only that, “scalability” is very critical. To consider future business expansion, we are investigating multi-cluster and multi-regional architecture.

The details of these projects will be described in further posts! Please check the index.

Index

The following is title, and author. It starts today and ends on 2022-02-24. We will release new articles every day (excep weekend)! They will be replaced with the link to the article and this will be the index of this blog series.

Title	Team	Writer
Introduction of Web Platform	Web Platform	Hiroshi URAYAMA
Implement the dynamic rendering service	Web Platform	Wei-Bo Chen
Introduction of Web Auth Service	Web Platform	Tracy Liu
Introducing Platform Infra Team at Mercari	Platform Infra	Vishal Banthia
Securing Terraform monorepo CI	Platform Infra	Daisuke FUJITA
Observability-kit: Adventures of using CUE at scale	Platform Infra	Harpratap Singh Layal
Automation of Terramform for AWS	Platform Infra	Kenichi SASAKI
Developer Experience at Mercari	Platform DX	Taichi NAKASHIMA
Kubernetes Configuration Management with CUE	Platform DX	Hideto Miki
Shifting to Zero Touch Production	Platform DX	Dylan Lau
Promote Zero Touch Production – further features of Carrier	Platform DX	Morito Ikeda
Introduction of the CI/CD team	CI/CD	Yuji Kazama
A Platform Support Workflow	CI/CD	Nathan Essex
Towards a more stable and secure CD system replacement	CI/CD	Tomoya Tabuchi
Defense Against Novel Threats: Redesigning CI at Mercari	CI/CD	Michael Findlater
Introduction of Search	Search Infra	Reggie LAI
メルカリの検索基盤の変遷について	Search Infra	Shimpei NAKATA
検索の応答性能を維持するための Benchmarking Automation	Search Infra	Yoshinobu Fujimoto
Introduction of the Network team	Network	Raphael Fraysse
How Istio solved our problems	Network	Yusaku Hatanaka
Managing Network Policies for namespaces isolation on a multi-tenant Kubernetes cluster	Network	Yohei Kanemaru
Dynamic Service Routing using Istio	Network	Sharma RAJESH
Introduction of CoreSRE	Core SRE	Hidenori Suzuki
特殊な構成のMySQLに対するDDL適用の一例	Core SRE	Motoaki NISHIKAWA
レガシーなシステムとの向き合い方	Core SRE	Takashi Honda
Embedded SRE at Mercari	Microservices SRE	Taichi NAKASHIMA
SRE伝道師としてMicroservices SRE チームが取り組んでいる事例	Microservices SRE	Katsuyuki OGUMA
Kubernetes HPA External Metrics の事例紹介	Microservices SRE	Masahiro TOKIOKA
MicroservicesSREのEmbedded先でのお仕事	Microservices SRE	Shota Mizumoto
Elasticsearch運用ノウハウ	Microservices SRE	Yoshinobu Fujimoto

Blog Series of Introduction of Developer Productivity Engineering at Mercari

Developer Productivity Engineering Camp

Index

Related article

Good tools are rare. We should make more!

Mercari’s Seamless Item Feed Integration: Bridging the Gap Between Systems

From Good to Great: Evolving Your Role as a Quality Consultant