Reducing Inter-Zone Egress Costs with Zone-Aware Routing in Mercari’s Kubernetes Clusters

Introduction

This summer, I had the opportunity to join Mercari’s Network Team as an intern, focusing on reducing network costs, especially inter-zone egress costs within our Kubernetes clusters. In this blog post, I aim to outline the problem we faced, the steps we took to solve it, and the promising results we’ve seen so far.

The Problem: High Inter-Zone Egress Costs

Mercari’s microservices are all running on Kubernetes clusters, specifically on Google Kubernetes Engine (GKE). We have Production and Development clusters spanning across three different Availability Zones (AZs) in the Tokyo region. The use of multiple AZs enhances our system’s fault tolerance, ensuring that even if one zone experiences issues, our services can continue to operate smoothly.

However, this architectural choice comes with its own challenges. Incoming network traffic to our services would be evenly distributed across Pods, irrespective of the AZ they were in. While this approach provides redundancy and high availability, it also incurred high costs for Mercari. Data transfer between different AZs comes with a financial cost, significantly impacting our Production environment.

The Solution: Zone-Aware Routing

Zone-Aware Routing is a strategy designed to optimize network costs and latency by directing traffic to services within the same Availability Zone whenever possible. This minimizes the need for inter-zone data transfer, thus reducing associated costs.

Zone-Aware Routing Solution

During my internship, we had the goal of enabling zone-aware routing.
There were two features available to achieve this:

  1. Locality Load Balancing in Istio for services with Istio.
  2. Topology Aware Routing for services using Kubernetes’ Kube-Proxy.

Istio is a service mesh for managing and securing microservices.. Mercari is in the process of adopting Istio, so we have a combination of services that do and do not use Istio. The choice between Istio’s Locality Load Balancing and Kubernetes’ Topology Aware Routing is determined by whether the service uses Istio. If the Pod communicating has an Istio sidecar, then Istio’s Locality Load Balancing will be utilized. If the Pod does not have an Istio sidecar, then Kubernetes’ Topology Aware Routing will be used.

Both Topology Aware Routing and Locality Load Balancing are mutually exclusive in their conditions for activation. If a Pod has an Istio Proxy inserted, only Locality Load Balancing will be used, and vice versa.

For Services Using Istio

Mercari utilizes Istio for its service mesh architecture. Istio comes with its own proxy and offers features like Service Discovery, Security, and Observability. To enable zone-aware routing, we adjusted the DestinationRule in Istio to include loadBalancer and outlierDetection configurations. The loadBalancer configuration is for setting how the zone-aware routing should be configured, and the outlierDetection is for determining when a zone should be seen as in failure to migrate to a different zone.

Here is an example of the DestinationRule:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: echo
spec:
  host: echo.sample.svc.cluster.local
  trafficPolicy:
    loadBalancer:
      localityLbSetting:
        enabled: true
        failoverPriority:
         - "topology.kubernetes.io/region"
         - "topology.kubernetes.io/zone"
    outlierDetection:
      # configure based on usual 5xx error rate of service
      consecutive5xxErrors: 10
      # configure based on the time taken to run up a new Pod usually.
      interval: 5m
      # configure based on HPA target utilization
      maxEjectionPercent: 15%
      # configure based on HPA target utilization
      baseEjectionTime: 10m   

For Services Using Kube-Proxy

For services that rely on kube-proxy, we used Kubernetes’ Topology Aware Routing. This feature prioritizes routing to Pods within the same topology (region, AZ, etc.). Implementing it is as simple as adding an annotation: service.kubernetes.io/topology-mode: Auto. More Details

Here is an example:

apiVersion: v1
kind: Service
metadata:
  name: http-server
  annotations:
    service.kubernetes.io/topology-mode: "auto"

Handling Imbalanced Traffic

Zone-aware routing, while effective in reducing inter-zone costs, introduces its own set of challenges. One significant challenge is imbalanced traffic distribution across Pods in different zones. This discrepancy can cause localized overload or underutilization, affecting the system’s overall efficiency and potentially incurring additional costs.

Below is a simple example with two services sending and receiving requests across two zones. Before Zone-Aware routing is enabled default round-robin request behavior is used and each instance of service 2 is going to get roughly 50% of the requests. However with Zone-Aware routing enabled for this service, one instance gets most of the requests(about 2/3 of all requests) while the other instance only gets 1/3. This creates an unfair workload, and the benefits of using Zone-Aware routing could be lost because of this imbalance.

Example: Before zone-aware routing enabled

Example: After zone-aware routing enabled
Example: After zone-aware routing enabled

Some of our services need to operate in specific zones. For these services, we have specialized NodePools configured with Kubernetes taints to ensure that only Pods for those particular services are scheduled there. This setup introduces an inherent imbalance in the number of Nodes across different zones.

To mitigate this, we initially considered using GKE’s location_policy: BALANCED to even out the Node count across zones. However, this policy doesn’t guarantee an always balanced distribution and doesn’t consider zones during scale-down operations, which can further exacerbate the imbalance.

Additionally, the Horizontal Pod Autoscaler (HPA) generally monitors Pods across all zones, considering their overall utilization. As a result, even if a specific zone is under heavy load, it may not trigger a scale-up if the utilization is low in other zones.

Our solution was to set up individual Deployments and HPAs for each zone, allowing for independent scaling based on the traffic within that zone. This ensures that even if traffic is concentrated in a specific zone, it will be adequately scaled to handle the load. We also created an individual PodDisruptionBudget (PDB) to limit the number of concurrent disruptions for each zone.

How we created Deployments and HPA for each zone

Choosing targets for trials

Selecting where to implement these changes was based on data-driven decisions. We operate in a multi-tenant Kubernetes (k8s) cluster, with multiple services in multiple namespaces. For our metrics, we used Google Cloud Metrics, specifically pod_flow/egress_bytes_count, to understand the volume of traffic between namespaces in this multi-tenant environment. This helped us identify high-traffic service-to-service communications that could benefit most from these adjustments.

Technical Configurations

At Mercari, we operate multiple services, requiring a multi-tenant k8s cluster. In a complex ecosystem like this, managing Kubernetes configurations often turns into a labor-intensive task filled with writing multiple long manifests. This is where k8s-kit mercari’s internal CUE-based abstraction of Kubernetes manifests comes into play, significantly streamlining the process.

k8s-kit is a tool designed to streamline Kubernetes configurations. It minimizes the need for manual setup and repetitive tasks, allowing developers to focus more on the logic and features of their services. The tool accomplishes this by offering various levels of abstraction, which simplify the deployment processes. Under the hood, k8s-kit uses CUE, a powerful language that aids in defining, generating, and validating data structures.
If you want to know more about k8s-kit check out our blog post: Kubernetes Configuration Management with CUE

To enable zone-aware routing we used k8s-kit to configure individual Deployments and HPAs for each zone, an important step that enabled us to implement zone-aware routing effectively. By significantly reducing the manual configuration workload, k8s-kit made it simple to set up this complex, yet crucial, feature.

The Outcome

Kubernetes’ Topology Aware Routing

We experimented with Kubernetes’ Topology Aware Routing in one of our services that use kube-proxy and observed excellent results. Traffic from the gateway now predominantly goes to the same zone’s pods. Below is how much traffic each pod in zone-b gets from each zone. Initially, Pods in zone-b used to receive equal amounts of traffic from zones A, B, and C. Now, we see significantly more traffic coming from zone B and less from A and C.

How much traffic each pod in zone-b gets from each zone.

Istios’ Locality Load Balancing

Initially, we had difficulty getting Locality Load Balancing to work in our development cluster. Even with the Locality Load Balancing setting the traffic was distributed evenly through the zones and not for each zone.
We were able to confirm that Istios’ Locality Load Balancing worked in the same cluster with HTTP connections. However it did not work in the namespace of the target application using gRPC. We are still doing the investigation to learn why it was not working.

Future Plans

Our experience with zone-aware routing has been promising, but there’s room for both improvement and automation. Going forward, we aim to enhance operational simplicity and streamline the management of multiple HPAs and Deployments across different zones. Our strategy involves configuring k8s-kit to make zone-aware routing more straightforward for service developers, with a focus on automating these processes.
Below is an example of how we hope to add zone-aware routing configuration to the k8s-kit

App: kit.#Application & {
        metadata: {
                serviceID: “sample-service”
                name:       “echo”
        }

        spec: {
                image: “sample-service”
                Network: {
                    # Add a configuration for k8s-kit to automatically make the zone-aware routing configurations and create HPA and Deployments.
                    routingRule: zoneAwareRouting: {} 
                }
        }
}

Challenges and Learnings

This internship served as a significant learning opportunity for me, especially since it was my first time diving into several new technologies and methodologies. Below are some of the key challenges and learnings I gained from this experience:

Kubernetes: Understanding its complex orchestration capabilities and learning how to configure deployments and services were enlightening.
Datadog: Leveraging it for metrics enabled me to gauge the effectiveness of our changes in real-time.
Spinnaker: Utilizing this continuous delivery platform to deploy changes taught me the importance of automation in DevOps practices.
k8s-kit: Mercari’s internal tool introduced me to best practices in Kubernetes deployments with varying levels of abstraction.

The journey wasn’t smooth sailing all the way. One of the most challenging parts was dealing with Istio’s Locality Load Balancing feature not working as expected in the development environment. The frustration mounted as we scoured through logs, configurations, and community forums without arriving at a root cause.

Conclusion

The project focused on reducing Mercari’s inter-zone egress costs within our Kubernetes clusters has shown promising outcomes. By implementing zone-aware routing strategies, we effectively minimized traffic across different Availability Zones, thereby reducing the associated costs.

I’m thrilled to have been a part of this project. I believe the experience and insights I’ve gained will be invaluable in my future endeavors.

In addition to the topics discussed here, Mercari’s Platform Team is also involved in various exciting projects like:

In-house CI/CD infrastructure development
Developing layers of abstraction for developers
Networking domains like Istio

If you find these challenges intriguing and want to be part of Mercari’s Platform Team, we’re actively looking for people to join us!
Engineer, Platform

Thank you for reading, and stay tuned for more updates from Mercari’s Network Team!

  • X
  • Facebook
  • linkedin
  • このエントリーをはてなブックマークに追加