Introduction of the Network team

This article is part of the Developer Productivity Engineering Camp blog series.

Introduction

Hi everyone, my name is Raphael Fraysse, Tech Lead for the Network team in Mercari. In this article, I will briefly introduce the team by explaining what we want to achieve (our mission) and by which means (our strategy).

Mission

Our team’s mission is tomake the communications between customers and our systems seamless, reliable and secure.

The network is all about the communications between the users of our applications and the systems running these applications. Our systems are fairly complex, in different geographical locations, with variable hops depending on the data path.

A typical data path on the network perspective can be roughly represented as follows:

From the user device to the edge systems, we have little to no control over the networks, especially as most of our users use our mobile application.

Edge systems are useful in shortening the distance where the communications are slower and less reliable. Our principal use case is to cache the images of products listed in our marketplace. We also use them to serve certificates, protect our infrastructure from DDoS and malicious attacks.

Behind the edge are our application systems running in public cloud providers (mainly GCP but also AWS). Thanks to the shared responsibility model of cloud resources, we don’t have to maintain the low-level network infrastructure (L1-L2 of the OSI model) and can focus on the upper layers (L3-L7) to deliver more value to our users.

Finally, the core of our data is located in our databases, still mostly hosted on-premises in our datacenters in Tokyo. Compared to the cloud, the datacenter operations require us to also maintain the low-level network infrastructure (optical lines, switches and routers, connectivity with Internet Exchanges (IX), etc…). However, we are in the middle of migrating these databases to the cloud and reducing our on-premises footprint.

As you may guess, the domains are fairly large and with different specializations. It is not very realistic to expect any member to be strong in all these domains so the team is mixed up with members having various skillsets around the network. It also means that members are free to get out of their comfort zone and challenge other domains, which is a great fit for curious people!

Strategy

Our team has the following strategy to realize our mission:

  • Abstracting away network concerns from developers and application code
  • Creating the core network building blocks and guidelines for the organization to ensure its overall security, reliability and scalability

Abstracting away network concerns from developers and application code

Being part of the Platform Group, we share the same focus around developer experience. The network is a critical primitive of infrastructure but it is not something developers should waste time with and rather focus on delivering value to our users.

To be effective, the important attributes of our network (Seamless, Secure, Reliable) need to be embodied end-to-end in the communication path and applications play an important role in it. The only way to save application developers’ time is to make these attributes easy to implement, by providing layers of abstractions handling the network concerns including:

  • Security features

    • TLS termination
    • Authentication and authorization
    • Network policies
  • Resiliency features

    • Circuit breaking
    • Retries
  • Exposure to user traffic

    • Load Balancers provisioning
    • Path-based routing
  • Deployment features

    • Fine-grained traffic control for canary release
    • Dynamic service routing for QA testing

When we were handling a big monolithic application a few years ago, most of those concerns were not a problem as the communication between the application components was mostly done within the same server.

These concerns appeared once we started migrating to a microservices architecture and many of them are directly related to the fallacies of distributed computing.

Some examples of projects that materialized this strategy:

1. Istio adoption at Mercari (presentation at IstioCon)

We use the service mesh pattern (Istio being an implementation of it) to tackle most of the above concerns. We are still in the middle of the migration process after facing several difficulties. The presentation gives a good overview of these difficulties and the status of our migration.

2. Preparing guardrails for Istio at scale (presentation at KubeFest 2020 Tokyo)

This presentation focuses on how we created the necessary guardrails to keep Istio running safely and prevent human errors from both developers and network operators.

We will also publish several articles in this blog series on projects within the next few days so stay tuned!

  • How Istio solved our problems by Yusaku Hatanaka (@hatappi) (2022/2/10 Thu)
  • Managing network policies for namespaces isolation on a multi-tenant Kubernetes cluster by Yohei Kanemaru (@kanemaru) (2022/2/14 Mon)
  • Dynamic Service Routing by Sharma Rajesh (@raju) (2022/2/15 Tue)

Creating the core network building blocks and guidelines for the organization to ensure its security, reliability and scalability

The above abstractions are mainly focused on inter-services communication. They are building blocks enabling scaling out microservices while keeping the ensemble stable, resilient and secure.

We also need to support the organization by making sure the network is not a bottleneck for its expansion. New businesses starting from scratch can lose a lot of time in bootstrapping their infrastructure and network. Moreover, new businesses integrations are almost inevitable to leverage the strength of the organization and network integrations can become big blockers if not designed properly.

At the same time, having the whole organization on the same compliance may slow down less requiring businesses so the network and security still need to find a balance.

Two examples of projects that materialized this strategy:

1. GCP Network Re-architecture to Shared VPC model

After assessing the state of the network, it was clear that its architecture would cause many blockers in our expansion plans. To ensure the scalability of our organization, we built a clean global multi-company network architecture which allowed us to integrate entities such as Mercari US to the Shared VPC model.

2. Network security guidelines for the Mercari Group

After facing a huge security incident (Codecov) last year, the network team took the initiative to globally make an inventory of networks used in our organization, enacted network security guidelines and implemented them to improve the organization security posture.

Conclusion

This article introduced the Network team at Mercari, its mission, strategy and some projects we did.

The Platform Group grew a lot over the past years and is still growing fast, making a positive impact in Mercari and all its subsidiaries. The network team has very exciting challenges such as strengthening our microservices communications or bootstrapping new business’s network while ensuring the organization’s network strategy to support its growth. If this is something that interests you and would like to know more about us, feel free to drop me a message over Twitter or LinkedIn. We sponsor work visas and the Japanese language is not a requirement for our team.