At Mercari, we have been working on reducing the number of long-lived credentials that could have a significant impact on our systems if leaked and abused. In order to achieve this we have implemented multiple systems that issue short-lived credentials. The Platform Security Team has extended an internally operated service called Token Server, which generates GitHub credentials, so that automated services running on Google Cloud can switch to short-lived credentials for accessing GitHub.
This article introduces the technologies, challenges, and solutions behind extending Token Server and migrating workloads on Google Cloud to use short-lived credentials.
Overview
Mercari primarily uses GitHub as its development platform, and we develop and operate many services that automate GitHub-related tasks.
These services typically access GitHub with a Personal Access Token (PAT) or a GitHub App private key, which can have no expiration or very long expiration periods. If such credentials are leaked (for example, through a supply chain attack), they can be misused for a long time. Also, once these long-lived credentials are created, it can be unclear which service uses which credential, and there is rarely a review of their granted permissions.
To resolve these problems, we extended an existing Token Server service (which already issues short-lived GitHub credentials inside Mercari) so that any service running on Google Cloud could also access GitHub without using long-lived credentials. This change provides the following benefits:
- Reduction of the number of long-lived credentials
- Reduction in the number of both PATs and GitHub App private keys (often managed in non-transparent ways)
- Simplified process for identifying which service uses which credential and for periodically reviewing permissions, by consolidating credential assignment and required privileges into one place
Moreover, we developed a Go library that allows existing services to migrate to Token Server with minimal changes, enabling quick adoption while avoiding major rewrites.
Token Server
At Mercari, GitHub is used in many different ways. In particular, for GitHub automation, it is common to implement changes in one repository and apply them to another repository automatically.
With GitHub Actions (our standard CI platform), there is no default way to handle automation across multiple repositories. Usually, you must store a PAT or GitHub App private key in Repository Secrets and generate tokens using, for example, the create-github-app-token action.
However, these methods require long-lived credentials (PAT or a GitHub App private key).
To address this, Mercari has been running a Token Server service that issues an Installation Access Token with certain permissions, by verifying an OIDC token that GitHub provides inside GitHub Actions workflows.
Installation Access Tokens are part of GitHub App functionality. They can be restricted to a subset of permissions (for example, read permission for contents, write permission for pull requests) and limited to certain repositories. They expire after one hour and can also be revoked via the GitHub API before they expire. This means you can provide credentials limited by the principle of least privilege, granting only the necessary scope, access range, and lifespan.
Token Server creates Installation Access Tokens from a pre-configured GitHub App, based on permissions for each repository and branch, and provides these tokens to GitHub Actions jobs in that repository. To identify which repository and branch to associate, the Token Server uses the OIDC token available inside the GitHub Actions job. The job obtains the OIDC token, sends it to the Token Server, which verifies the token, looks up the permissions set for that repository and branch, and then creates and issues an Installation Access Token.
Installation Access Tokens issued by Token Server are used for a wide range of activities, such as multi-repository automation (adding commits, automatically creating issues, pulling requests) and downloading private libraries during builds.
(Note) In April 2024, Chainguard released Octo STS. Its core principle is similar to Token Server. However, Token Server provides more unified permission management and also integrates with Google Cloud workloads and GitHub App load balancing. This makes it well suited for enterprise environments.
Token Server’s Extension to Google Cloud
At Mercari, many services run on Google Cloud. This includes not only customer-facing microservices but also internal services for automation. These services accessed GitHub using PATs or GitHub App private keys.
Each Google Cloud resource has a Service Account that can be granted privileges to operate other resources. When a Google Cloud resource has the roles/iam.serviceAccountTokenCreator permission, it can obtain an OIDC token signed by Google via an API. We decided to extend the Token Server to verify these Google-signed OIDC tokens just like we do with GitHub’s OIDC tokens, so we can issue an Installation Access Token with predefined permissions.
With this approach, a service running on a given Google Cloud resource can send an OIDC token to the Token Server, receive an Installation Access Token, and then use it to access GitHub – eliminating the need for previously stored PATs or GitHub App private keys in Google Cloud.
Applying Token Server to Workloads on Google Cloud
By extending Token Server, services on Google Cloud can now switch their GitHub access credentials to a short-lived token.
It is relatively easy to apply these new features to newly created services on Google Cloud. However, for many existing services that have already been using a PAT or GitHub App private key, implementing the process of requesting an Installation Access Token from Token Server and then using it can be difficult.
Moreover, GitHub Apps have a rate limit on API usage: 15,000 requests per hour per GitHub App on GitHub Enterprise Cloud. Exceeding this rate limit causes API requests to fail. Because Token Server can serve multiple Google Cloud workloads and multiple repos, it is critical to reduce the total number of requests.
It is also important to note that the rate limit covers not only the number of token issuance requests to the Token Server but also all API traffic made using each issued Installation Access Token. Instead of requesting a new Installation Access Token for every single GitHub API call, the approach is to reuse the same token within its one-hour validity period, thus reducing the overall requests.
To avoid major rewrites in existing services and to automatically obtain and reuse an Installation Access Token within its validity period, we developed a library. Because Mercari mostly uses Go, we built this library on top of the google/go-github library, which is widely used in Go-based GitHub automation. If an existing service already uses go-github, the service can migrate to Token Server simply by configuring the Service Account and replacing the library.
Library Structure for Token Server
When you initialize the go-github library, you can specify any http.Client. The http.Client uses a custom RoundTripper implementation that can modify the request before it is sent. We leverage this RoundTrip method to check if the cached Installation Access Token is still valid. If it has expired, we request a new Installation Access Token from Token Server; otherwise, we reuse the existing one.
With this design, existing services only need to change a single line of code to migrate to Token Server (if they already use go-github).
GitHub App Load Balancing
As mentioned before, each GitHub App has a rate limit of 15,000 requests per hour. Token Server will potentially handle a large number of API requests from multiple Google Cloud workloads and multiple GitHub repositories. We also expect an increase in automated services over time, so we must be prepared for traffic that could exceed these limits.
To handle this, we considered creating multiple GitHub Apps and distributing requests among them to avoid hitting a single GitHub App’s rate limit. However, if a load balancer randomly distributes requests to multiple Token Server pods, each loaded with a different GitHub App, a single user might receive tokens from more than one GitHub App.
This becomes an issue for a service that writes commit statuses. In GitHub, you can record statuses (error, failure, pending, success) for a single commit. These statuses are tracked per GitHub App. If multiple GitHub Apps post statuses for the same commit, the statuses become mixed. In a workflow where the first step might post a failure status and a later step posts a success status, these statuses need to come from the same GitHub App to overwrite properly. Otherwise, you could end up with a failure status from GitHub App 1 and a success status from GitHub App 2, which could block merges if branch protection requires all statuses to pass.
If the first failure status comes from GitHub App 1, a subsequent success status from GitHub App 2 cannot overwrite it. This results in mixed commit statuses that can prevent merging.
To solve this, we assign the same GitHub App consistently for each target. One Token Server pod can load multiple GitHub Apps, then choose which GitHub App to use based on the repository and branch name (on GitHub) or the Service Account (on Google Cloud).
By mapping GitHub Apps according to repository, branch name, or Service Account, we ensure that the same GitHub App is always used for the same repository, branch, or Service Account.
Summary
By extending Token Server to Google Cloud, more services can use short-lived credentials for GitHub, reducing the need for long-lived credentials. We also developed a library that lets existing services migrate to Token Server with minimal changes. Through these efforts, we solved issues discovered during real-world operations, supporting more secure and efficient GitHub automation at Mercari.
The Mercari Security Team will continue working on replacing long-lived credentials with short-lived ones.
For information on careers in the Security Team, please see Mercari Careers.