Scaling Kubernetes Tenant Management with Hierarchical Namespaces Controller

Author: @deeeeeeeet from Platform Developer Experience Team

Three years ago, we took the decision to break our monolithic API into microservices, and move from the physical machine deployment on-premise to container deployment on GCP by using Google Kubernetes Engine (GKE).

We architected our Kubernetes cluster with multitenancy, where a tenant (a single service) is a single Kubernetes namespace. Now we have more than 300 namespaces, and, as more namespaces were created, the maintenance cost to teams who owned multiple services also increased (configuring RBAC, for example). To solve this problem, we introduced Hierarchical Namespaces Controller (HNC).

In this blog post, I’m gonna explain details of our multitenant Kubernetes architecture and the issues we faced. Then show how we introduced HNC with the existing ecosystem we have and solved the issues.

Background

Multi-tenant Kubernetes Architecture

There are two main ways of architecting the Kubernetes cluster: one is creating a cluster per tenant (multi-cluster pattern) and the other is hosting multiple tenants in a single cluster (multi-tenant pattern).

They both have pros & cons. For example, if you choose the multi-cluster pattern, you can strongly isolate workload and access permissions by cluster level but the burdens of managing the cluster and cluster components are increased (like needing to upgrade a cluster three times per year). On the other hand, if you go with the multi-tenant pattern, you don’t need to pay that cluster managing costs but you need to share the control plane and lots of caring of tenancy isolation and the design of access control are required.

To minimize the cost of cluster and cluster component management, and because we find there are many available features that enable us multitenancy e.g., Namespace, RBAC, NetworkPolicy, and so on, we decided to adopt the multi-tenant pattern.

The following diagram shows how we architect our multi-tenant Kubernetes (and integrate with other GCP ecosystems):

We use the Kubernetes namespace as a unit of the tenancy and create a namespace per service. We also configure RBAC to allow the service owners to access the namespace. The reason we don’t create the tenant per team is to allow changing the service ownership easily. For example, if you manage the tenant per team, when you want to transfer the service maintenance to another team, it becomes expensive to migrate the service between the namespaces. But if you manage the tenant per service, what you need to do when service transfer is changing the access role (RBAC) and it’s very cheap. The organization structure always changes and we considered it.

We try to avoid hosting stateful workloads on Kubernetes and utilize the managed DB provided by GCP as much as possible. Since we can not strongly isolate the GCP resources access policy within a single GCP project (it’s getting better though), we decided to create one GCP project per tenant (service). If the service’s workload needs to store some state, DB (e.g., Cloud Spanner) is created in its GCP project and connected from the workloads on its own namespace.

Not only GCP, but we also use the other infrastructure toolings like Spinnaker for the delivery platform and SaaS like PagerDuty for the on-calling scheduling. Even in such components, we also configure tenants or create accounts per service and have a consistent multitenancy architecture.

(If I look back about the decision of multitenant architecture, I think it was a good decision. After we created the cluster, we’ve been investing lots of time in maintaining it: upgrading the cluster itself and various components like Istio, enabling new features provided by GCP, responding to CVE, and so on. If we had lots of clusters to maintain, it should have been really hard. Of course, we have some complexity but compared with the maintenance cost, it’s cheap.)

Tenant Management by microservice-starter-kit

As you may notice, to start a new microservice, many tenancy configurations are required. Since new services are being created all the time, it’s not realistic that the platform team manually configures all of them every time a product team wants to create a new one.

To solve this problem, we introduced a custom Terraform module named microservice-starter-kit (starter-kit). This Terraform module is configured for each service and it bootstraps all required tenants and configurations mentioned above: Kubernetes namespace & RBAC, GCP project & IAM, NetworkPolicy, Istio configuration, Spinnaker application, and so on. The product teams use this module and create a microservice without involving the platform team.

We’ve been using this module from the beginning and used not only by Mercari JP but also Mercari US and Merpay. Now we have more than 300 services created by this module (this means we have more than 300 namespaces in the single Kubernetes cluster).

Problems

The tenant needs to be configured not only when bootstrapping but also for continuous maintenance. For example, when the new members join the service owner, they need to be added to the starter-kit configuration to grab access to the tenant e.g., with Kubernetes RBAC or GCP IAM.

The problem started to happen when one team manages multiple services. The more the number of services the team needs to maintain increases, the more the burdens of maintaining the tenant increases. For example, when new members join, they need to be added to all tenants that the team manages. The following diagram shows this problem:

Tenant Management with Hierarchical Namespaces Controller

To solve this tenant management problem, we introduced Hierarchical Namespace Controller (HNC) developed by Kubernetes Working Group for Multi-Tenancy. To know about HNC, the blog post on Kubernetes blog Introducing Hierarchical Namespaces by Adrian Ludwin is the best introduction. But, in short, with HNC, you can create parent and child relationships between two namespaces and you can inherit Kubernetes resources from parent to child (you can do more like sub-namespace creation but now we mainly use this resource inheritance).

Multiple Tenants Management by microservice-team-kit

Like namespace management, we don’t expose raw HNC resources to our product teams but wrap it with new internal tooling named microservice-team-kit (team-kit). This is also built as a Terraform module and, as the name says, it manages the team-related resources: the team memberships, Kubernetes RBAC, GCP IAM, and so on. The following is an example interface of the team-kit:

module "team-a" {
  source    = "microservice-team-kit/v0.2.3.tar.gz"
  team_name = "team-a"

  team_members = [
    "XXX@example.com",
    "YYY@example.com", 
  ]

  team_permissions = {
    production = {
      kubernetes = [
        "view",
      ]
      gcp = [
        "roles/viewer",
      ]
    }
  }
}

The configuration exposed to the product team is basically a member list of the team and permission to have in the kubernetes namespace and GCP project.

What the product team needs to do is just define the team in a single place by this module and then assign the team to the services they manage by starter-kit (we introduced a new argument in the starter-kit). With this, the team members registered in the team-kit can grab access to all owned service tenants (e.g., Kubernetes namespace and GCP project). All team-related resources are centrally managed and individual tenant configurations do not need to be touched. For example, when new members join the team, only the team-kit configuration needs to be updated.

How microservice-team-kit works

Internally, what the team-kit does is creating a team Kubernetes namespace and binding RBAC for the team members to the namespace. Once the team is assigned to the service, the starter-kit registers its namespace as a child of the team namespace by using HNC’s HierarchyConfiguration. The following is an example configuration done by this (think we create “team-a” and assign it to “service-1”).

apiVersion: hnc.x-k8s.io/v1alpha2
kind: HierarchyConfiguration
metadata:
  name: hierarchy
  namespace: service-1 // service namespace created by starter-kit
spec:
  parent: team-a // team namespace created by team-kit

With this, RBAC resources created in the team namespace are inherited to the service namespace by HNC. Currently, we mainly use it for the RBAC resource but we can think about supporting other resources like ResourceQuota and so on.

Not only the Kubernetes tenant but also we manage GCP tenants (GCP project) (as described above we create a GCP project per service) in a similar way. GCP also has the concept of the resource hierarchy. In GCP, all resources like GCE instances or GCS buckets belong to one GCP project. The GCP projects can be grouped by GCP folder and all projects and folders are rooted in one single GCP organization. IAM configurations or security policies defined in the organization or folders are inherited to its child layers. We use this functionality.

Like Kubernetes namespace, the team-kit creates a team GCP folder and binds IAM for the team at the folder. Once the team is assigned to the service, the starter-kit moves the GCP service project to the team GCP folder. With this, IAM binding in the folder is inherited to the service projects.

Alternatives

Alternative of hierarchy-based role management is using the Google group. It’s possible to create a google group per team and assign the group to GCP IAM and Kubernetes RBAC. The one of the limination of hierarchy-based solution is "one service (tenant) can belong to one team" but, with Google group, there is no such limitation and multiple teams can be assigned to one services.

But there are some drawbacks. One of the benefits of microservices is independent deployment. If the service is managed by multiple teams, then we may lose this benefit. e.g., We may need approvals from multiple teams to add some changes. And we may lost the ownership of the service… We think the ownership is very critical in microservices ecosystem and didn’t want to lost it.

Actually, we choose the hybrid approach: using hierarchy-based role management as primary and use Google group as supplemental way. While making sure the ownership of the service by hierarcal way, we can also assign cross-cutting team like data-platform or SRE to the service by Google group.

Current Status & Future

We’ve been already using team-kit and HNC in production for more than half a year and it works well without any issues. Currently, we’re still in the middle of migration of this new tenant management and aiming to achieve full adoption of this by the end of this year. The more we migrate, the more we expect the reduction of maintenance cost of multiple services.

And currently, we mainly use this for resource access management. But HNC can inherit various resources like ResourceQuota or NetworkPolicy. We are thinking to manage them as well and more efficiently manage the tenant.

We are also working on having one more layer(parent) on the team layer: "company". Mercari is growing and we consist of multiple companies e.g., Mercari and Merpay. And each companies has its own different policy, with having company layer, we can enforce such company policy to the underline teams and services. The following is the structure we are working on now:

.
├── services
│   ├── mercari-jp // company layer
│   │   ├── team1 // team layer
│   │   │   ├── mercari-a-jp // service layer
│   │   │   └── mercari-b-jp
│   │   └── team2
│   │       └── mercari-c-jp
│   └── merpay-jp
│       └── team3
│           ├── merpay-a-jp
│           └── merpay-b-jp
└── platform
    └── team5
        └── platform-a-jp

Conclusion

In this blog post, I explained our Kubernetes tenant management and how we improved its burden. I hope this tenant management gives some insight to the company which struggles with similar tenant management problems.

Special thanks to Adrian Ludwin and Yoshi Tamura for helping us to adopt HNC.

Hiring

Platform Group is hiring. If you have interested, please check the JD!