2021/02/10

Migrating Spinnaker from Halyard to Kleat

Author:: keke; m

, 2021/02/10

Migrating Spinnaker from Halyard to Kleat

Hi, this is @_k_e_k_e and @m from the Microservices Platform Group.

A few months ago, our team migrated from Halyard to Kleat, which we had been using to deploy and manage our Spinnaker since it was first introduced in Mercari. Migrating to Kleat had many advantages, but it was not an easy transition, and there are still a few concerns.

Today, I would like to share the whole story of our migration, the bright side, and the dark side.

Spinnaker in Mercari

Before we actually get into the main topic, let me show how Spinnaker is used in Mercari.

We use one Spinnaker to deploy almost all of our microservices to the Google Kubernetes Engine (GKE) cluster and Spinnaker is hosted on a dedicated GKE cluster.

Fig 1. A high-level overview of Mercari’s microservices deployment workflow

Spinnaker was first introduced to Mercari about four years ago. As our business grew, our microservice architecture grew and matured. We have created many microservices, and each microservice has run its development cycle and released features. With that in mind, Spinnaker became one of the most important products used in Mercari.

In writing this article, I gathered the data again and came up with the following statistics:

I do believe that Mercari is one of the most heavy users of Spinnaker. How large or small compared to your organization? However, in all cases, there are maintenance costs and deployment hassles. I think this article will be of help to any person.

How the Spinnaker was managed?

Halyard has managed Spinnaker since it was first introduced. Halyard is a tool for configuring, installing and updating Spinnaker.

We hosted Halyard on a Google Compute Engine (GCE) instance and using it to deploy Spinnaker to the GKE cluster in distribution mode as described below:

Fig 2: Overview of Halyard in Mercari

To perform any operation on Spinnaker, we had to log in via SSH connection over VPN, and backups were taken regularly.

Difficulties on Halyard

We have been operating this way for almost four years, but there were some things that were not easy to use:

Hard to make it “Infrastructure as Code (IaC)”: Halyard deploys Spinnaker by setting up a configuration file called Halconfig which is located in the Halyard instance. This configuration includes the version of Spinnaker, turning certain features on and off, setting authentication and authorization information, etc. Trying to manage it with Git was difficult, and we had to log in to the instance to check Spinnaker’s configuration information and deploy the Spinnaker. When we enabled the authorization feature, we needed to add permissions for 100+ microservices, which we had to do manually with Halconfig and could not benefit from IaC. It made it hard to review the infrastructure changes and Spinnaker was kind of a “snowflake server”.
Hard to CI/CD for the Spinnaker itself: There was a major obstacle in building CI/CD for Spinnaker itself, as it requires SSH connection to the instance. We tried Ansible, Puppet, Chef etc., but we didn’t use them because the next step would be to manage this playbook, manifest or recipes, and we would have to manage a bead of them.
Not flexible to make each component’s Kubernetes Deployments configurations: When Spinnaker is deployed in distributed mode, the Spinnaker microservices like Fiat, Clouddriver, Front50 are deployed as Kubernetes Deployments. Halyard wraps kubectl and exposes only a limited API, so we could not change some components’ configuration to suit our organization. Also, it is not easy to change only some components after deploying, and you have to reapply everything through Halyard.
Difficult to maintain the Halconfig: There are lots of optional configuration values making it difficult to manage the configuration file. Halconfig is designed to be a single entry point for all Spinnaker related settings. There is a mechanism that allows you to merge configuration patches inside Halyard, but basically you have to write everything in Halconfig. However, because you can do everything, there are many irrelevant fields that you don’t need, and Halconfig is huge and hard to manage!

Of course, there were many advantages, but there were also quite a few disadvantages, as you can see here. However, there was no other easy way to deploy and manage Spinnaker at the time.

But then Kleat came along.

What’s Kleat?

Kleat is a lightweight tool for managing Spinnaker configuration, and it’s a replacement for Halyard. Note that Kleat by itself doesn’t deploy Spinnaker, and is often used in conjunction with Kustomize, a Kubernetes native configuration management tool.

This article does not introduce Kleat itself, so please read the RFC “Replacement of Halyard” if you are interested. To explain it very simply, it is a tool that can output and manage configuration files for each component based on Halconfig, which we have been using in Halyard.

We use Kleat to create configuration files for each component and Kustomize to bundle them as Kubernetes manifests for management and deployment. And finally, it’s deployed with kubectl.

Fig 3. The flow of how Spinnaker is configured with Kleat and Kustomize

Our Migration Plan

This is how our migration took place.

1. Port Spinnaker microservices deployment configurations into Kubernetes manifests

First of all, we ported the existing Spinnaker microservices deployment configurations like number of replicas and memory resources into Kubernetes manifests. An example of deployment configurations in Halyard is below:

deploymentEnvironment:
  customSizing:
    echo:
      resources:
        requests:
          cpu: 2
          memory: 1Gi
        limits:
          cpu: 4
          memory: 1Gi

We ported them into Kubernetes manifests as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: echo
spec:
  template:
    spec:
      containers:
      - name: echo
        resources:
          requests:
            cpu: 2
            memory: 1Gi
          limits:
            cpu: 4
            memory: 1Gi

We forked spinnaker/kustomization-base, a official collection of Kubernetes manifests for Spinnaker microservices, and then we did the above for all the Spinnaker microservices.

2. Create Secrets and bind them with Halconfig

spinnaker/kustomization-base is intended to manage all the configuration files as Kubernetes Secret. But Spring Boot, which is a Java framework Spinnaker microservices are depending on, resolves environment variables in the application properties. We defined only the credentials as Kubernetes Secrets, and the rest as ConfigMap. It is now possible to manage most of the configuration files as plain text by replacing only the necessary parts of them as environment variables. It means we can easily review most of the configurations in a Git hosting service like GitHub, which is what we can’t with Halyard.

This is a sample of our Halconfig with GitHub Token credentials:

artifacts:
  github:
    accounts:
    - name: <mercari-spinnaker-bot>
      token: ${GITHUB_TOKEN}
      username: <mercari-spinnaker-bot>
    enabled: true

And then we create a Kubernetes Secret for the GitHub token and mount it on a Pod:

apiVersion: v1
kind: Secret
metadata:
  name: artifacts
type: Opaque
data:
  GITHUB_TOKEN: xxx...
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: clouddriver
spec:
  template:
    spec:
      containers:
      - name: clouddriver
        envFrom:
        - secretRef:
            name: artifacts

It allows us to separate concerns and review configuration changes more easily.

3. Add patches to support Spinnaker’s 1.20 release

This part of the process was the most difficult. Kleat doesn’t fully support Spinnaker 1.20. Kleat does not fully support Spinnaker 1.20, so you will need to configure and patch it as follows.

Custom Service Settings
- Add Deck’s Kubernetes Provider settings
- Add Redis configuration to all Redis using components
Patch to Kleat artifacts
- Kleat does not fully support Spinnaker 1.20, so you will need to configure and patch it as follows

If you are still using Spinnaker 1.20 or later and you want to migrate to Kleat, you can follow this steps.

3.1 Custom Service Settings

The Kubernetes V1 provider had been completely removed from support, but still remained in Mercari. It became a blocker and we were not able to upgrade to a newer version of Spinnaker.

There were many options, but instead of waiting until we got rid of the V1 Provider altogether, we were able to patch Kleat so that we could use it with Spinnaker 1.20 which Kleat does not fully support. For example, configure the missing Kubernetes Provider setting in Deck’s settings-local.js:

window.spinnakerSettings.providers.kubernetes = { defaults: {} }

and in fiat-local.yaml:

redis:
 connection: ${services.redis.baseUrl}

3.2 Patch to Kleat artifacts

It is impossible to configure the settings required for Spinnaker 1.20 through Custom Service Settings. In some cases, it was necessary to patch the Kleat productions. One example is the Kubernetes V1 Provider configuration I mentioned earlier, which Kleat does not support, so the Kleat productions exclude them.

Therefore, we applied the following Bash patch to the cloudriver.yaml deal with it:

inject_kubernetes_accounts_provider_version_v1() {
  local -r clouddriver_yml="clouddriver.yml"
  local -r kubernetes_accounts=($(yq r "${clouddriver_yml}" 'kubernetes.accounts[*].name'))

  for i in "${!kubernetes_accounts[@]}"; do
    local account_name
    account_name=$(yq r "${clouddriver_yml}" "kubernetes.accounts[${i}].name")

    for account_v1 in "${kubernetes_v1_accounts[@]}"; do
      if [[ "${account_name}" == "${account_v1}" ]]; then
        yq w -i "${clouddriver_yml}" "kubernetes.accounts[${i}].providerVersion" 'V1'
      fi
    done
  done
}

This script will add Kubernetes V1 Provider to the Kleat artifacts. Of course, now that we’re using Spinnaker 1.24, these patches are no longer needed and have already been removed.

4. Recreating Spinnaker

We scheduled a maintenance window and we completely removed our Spinnaker by using the Halyard CLI command. Then we applied the Kustomize build artifact with kubectl.

After a few minutes of waiting, all the pods were in the Ready state and I was able to use Spinnaker normally.

That’s it.

What has brought us?

The benefits of the migration are as follows.

Fine control over each Spinnaker components

Fig 4. PR to upgrade Fiat Deployment’s CPU resource limit

Rather than managing the configuration and deployment of all Spinnaker components in a comprehensive Halyard manner, we can now define and apply each Kubernetes Manifest independently. This allows you to manage your Kubernetes configuration more flexibly, and adjust it according to Spinnaker’s specifications.

For example, as shown in Figure 5 above, if you want to raise the CPU resource limit for Fiat, you only need to update the Manifest for Fiat.

Declarative configuration

You can get the benefits of Infrastructure as Code (IaC) by having Halyard predefine the Kubernetes manifest instead of dynamically generating it at deployment time. At the same time, it makes it easier to review configuration changes by the PR, making it much easier to operate.

As shown in the previous figure, you can manage and update Spinnaker through GitHub PR. The operator no longer has to SSH connect into the Halyard instance to update the Halconfig.

CI/CD of Spinnaker

Fig 5. Example flow of the Spinnaker update

We no longer need to SSH into Halyard’s GCE instance and then operate it, and we can deploy Spinnaker in exactly the same way as any other microservice. We deploy Spinnaker via GitHub Actions, which makes it very easy to make updates and such.

This is one of the advantages of Kleat’s ability to manage configuration files on GitHub.

Integration with the Kubernetes community

As with other Kubernetes manifest, we can now use viglesiasce/kube-lint and other tools to lint the manifests.

The Kubernetes community has powerful tools for developing and operating various Cloud Native applications, so you can now take advantage of them and change Spinnaker settings more safely.

What’s still hard?

However, it is not only a good thing. There is also a dark side, the side that we still have a few problems or room to improve.

Design differences with community Kustomize base

We are running our fork of spinnaker/kustomize-base because there are major differences from the community Kustomize-base, such as the way Kubernetes Secret and ConfigMap are defined. This means that we have to keep up with any changes to upstream, which increases the load when upgrading. The community one sets credentials directly in Halconfig. However, we forked it because we want to create a dedicated separate secret for each Spinnaker credentials as a separate Kubernetes Secret.

Take a look at spinnaker/kustomize-base, and if it doesn’t fit with your use case, you need to fork it or start from scratch.

Difficult to envision the scope of impact of changing Halconfig

Whether you use Kleat or Halyard, the Halconfig file is still the only entry point to configure Spinnaker. However, using Kleat, the Kubernetes manifest is defined in a distributed manner, which means that we have more things to manage. For example, when we enable the authorization feature, we need to set the Kubernetes Secret as an environment variable in Fiat, Gate, etc. We need to know what additional changes we need to make depending on the possible changes in Halconfig, which is more work than with Halyard.

Fig 6. Should know the impact when you change Halconfig

We often forgot to mount the credentials in the Spinnaker component after enabling a certain feature in Halconfig. If we were using Kustomize-base from the community as our base, we would not have any problems, but in our case, we need to pay attention to this when we make an update on Halconfig.

Final Thoughts

It’s been a long time since Spinnaker 1.21 was released which Kleat fully supports, and the community fully dropped support for Kubernetes V1 Providers. Now that you know why Kleat is a game-changer and a driving force to improve your Spinnaker deployment and management, this may be a good time to consider migration. As explained in the RFC, Halyard will be eventually deprecated in the future, so you should be ready for the migration.

Continuous delivery is one of the most important aspects of software development and one of the most important indicators to measure competitiveness. The migration was not a straightforward process, but with Kleat, the maintenance cost of Spinnaker itself has been reduced, and we can operate it more flexibly according to our usage. By doing so, we can provide a more stable and reliable platform. There are still areas where we can improve, but Kleat has greatly improved our platform’s quality.

In closing, I would like to say a huge thank you to the Spinnaker community.