Abstract
Retiring a legacy image resizing path sounds straightforward until you realize how many “invisible” callers exist: long-lived app versions, embedded clients, partner integrations, and bots. In our case, Mercari Platform Network team, migrated a legacy image transformation pipeline to Cloudflare Images while keeping existing URLs working.
This article focuses on the Cloudflare setup and safe rollout and explains the tradeoffs we made to reduce risk. It intentionally skips backend service details so the story stays centered on edge configuration, traffic estimation, and operational safety.
What the migration looked like
This migration had one non-negotiable constraint: existing image URLs had to keep working throughout the transition. That single requirement shaped almost every design decision, because it forced us to run old and new paths side by side and prove safety with production traffic.
The diagram below shows the simplified request flow. The important point is that the “easy” part is adding a new provider, while the hard part is understanding how many request patterns exist in the wild and how they interact with caching.

At first glance, this kind of migration can look like a straightforward origin swap. That can be true for a small system with one caller and short cache lifetimes, but it is rarely true for a long-lived public image pipeline.
In practice, we had to account for compatibility with existing request patterns, unexpected side effects, operational costs, zero-downtime rollout requirements, and monitoring that could catch regressions quickly. Those concerns became concrete challenges once we started mapping dependencies.
Migration challenges
A legacy image pipeline tends to sit at the boundary between many systems. Even if the official callers have moved on, old patterns can survive for years through caches, bookmarks, copy-pasted snippets, and client code that is hard to update.
That creates a specific failure mode: the traffic volume may look small, but the blast radius can still be large. When something breaks, it often breaks in places that are difficult to reproduce in staging.
In practice, this means migrations like this are less about “switching a backend” and more about dependency discovery. If we miss a dependency, we learn about it in production, and usually at the worst time.
Direct Amazon S3 access from Cloudflare
We expected that “just point Cloudflare to the bucket” would be the simplest approach, but a seemingly minor naming choice became a TLS constraint. Our goal was to keep an HTTPS (HTTP over TLS) path from the edge to the origin while preserving a legacy bucket name.
Amazon Simple Storage Service (Amazon S3) provides several ways to access a bucket and its contents. In many systems, any of these options can work, but the details matter once you require HTTPS end-to-end.
| Type | How to access | Restrictions |
|---|---|---|
| Path-style | https://s3-ap-northeast-1.amazonaws.com/your-bucket-name/your-bucket-contents | Not recommended by AWS. It had a deprecation plan, but it is still active. |
| Virtual host-style | https://your-bucket-name.s3-ap-northeast-1.amazonaws.com/your-bucket-contents | If the bucket name is xxx.xxx.com, HTTPS can break because of certificate mismatch. |
| Host header-style | https://s3-ap-northeast-1.amazonaws.com/your-bucket-contents -H “Host: your-bucket-name” | You must inject the Host header somewhere along the request path. |
Mercari’s S3 bucket for images has a long history, and it uses a legacy naming convention based on a domain name like xxx.mercari.com.
Cloudflare also provides Cloud Connector, but it did not work cleanly for this scenario. When we used a bucket name like xxx.mercari.com, we hit an invalid SSL certificate (Error code 526) during the HTTPS handshake.
Egress traffic and image quality and cost
Even if every request succeeds, changes in compression and resizing behavior can change egress volume and affect image quality. If we shipped that kind of change blindly, we could create a cost regression or a customer experience regression without any obvious outage.
With Cloudflare Images, we expected changes in output size and compression behavior, which directly affects egress traffic. For example, if the average image size increases by 50%, egress typically increases by roughly 50% as well.
At the same time, if images look different or lose important details compared to the legacy image provider, the migration can negatively impact customer experience. That meant we had to measure both size and perceptual similarity before increasing rollout.
Zero-downtime rollout
Even with correct edge configuration, rollout mechanics can still break production if we ramp too quickly or if we misjudge cache behavior. We assumed that unknown legacy access patterns would exist, and we designed the rollout so failures would stay small, measurable, and reversible.
Several factors could block or slow down the release process, including S3 rate limits, cache rebuilding, and unknown legacy access patterns. As a result, we treated rollout design as a first-class engineering problem rather than a final deployment step.
How we resolved it: S3 access from Cloudflare
This section explains the concrete edge configuration that allowed Cloudflare to fetch from S3 over HTTPS even with our legacy bucket name. The key idea was to use an origin override so we could keep the request URL stable while controlling the origin host and headers.
Because virtual host-style access did not support HTTPS for our bucket name and AWS discourages path-style access, we chose the host header-style approach for this migration. That allowed us to connect to the regional S3 endpoint while presenting the legacy bucket name in the Host header.
Using our internal Terraform module cdn-kit, we implemented this by routing S3 origin access through an origin override. We also separated the “real” public endpoint from a placeholder endpoint used only for origin modifications, so we could keep the rules explicit and auditable.
module "cdn_kit" {
# ...
endpoints= {
"@" = {
backend = {
host = "legacy-image-provider-endpoint"
}
}
"s3" = { # placeholder endpoint for origin modification
backend = {
host = "s3-ap-northeast-1.amazonaws.com"
}
}
}
request = {
origin_modifications = [
{
host = "xxx.mercari.com" # bucket name
expression = <<EOC
(not starts_with(http.request.uri.path, "/prefix/xx/"))
EOC
origin = "s3.${var.domain}"
}
]
}
# ...
}
This approach kept the migration reversible. If we saw unexpected errors, for example, edge error rate, origin error rate, and origin response time, we could disable the origin modification rule and fall back to the legacy provider without changing client behavior.
How we resolved it: image quality, egress and cost
In this section we explain how we turned the “it might be expensive” fear into measurable signals and concrete guardrails. Instead of guessing, we validated behavior under controlled traffic and used the results to set rollout pacing.
Two areas could be impacted by the migration. First, we needed to confirm whether Cloudflare Images behaved like the legacy provider in terms of resizing and compression.
Second, we needed a way to estimate cost. Cloudflare Images uses a different billing model, so we had to validate how to measure and forecast usage with enough confidence to proceed.
Image quality
Availability metrics cannot detect a silent quality regression, so we validated outputs directly. The goal was to ensure that “successful” responses still delivered images that looked the same to customers.
Cloudflare Images uses a slightly different compression algorithm than the legacy provider. We randomly sampled thousands of image IDs from access logs and compared outputs across parameters like quality and width.
We found that Cloudflare Images often produced larger files, especially for WebP. This could increase egress traffic and cost by up to ~50% in some cases, even though JPEG outputs were sometimes smaller than the legacy provider.
Beyond file size, we compared similarity and pixel-level differences between the legacy outputs and Cloudflare outputs. Pixels differed slightly after resizing, but similarity stayed almost unchanged. Based on this, we chose a lower quality setting than Cloudflare’s default to reduce file size while keeping high visual similarity.
Egress and cost
Cloudflare Images pricing is based on unique transformations, which means the long tail can matter more than request volume. We needed a method that matched Cloudflare’s 30-day counting model, otherwise our estimates would drift.
Because Cloudflare Images uses a 30-day window to count unique transformations, it is hard to estimate monthly usage from per-day or per-hour samples. The safest approach is to run a 30-day query and use the result as the baseline.
SELECT
APPROX_COUNT_DISTINCT(ClientRequestURI) AS unique_transformation
FROM `...access_logs`
WHERE
EdgeStartTimestamp BETWEEN TIMESTAMP("YEAR-MONTH-01") AND TIMESTAMP("YEAR-NEXT_MONTH-01")
AND ClientRequestSource = 'eyeball'
AND EdgeResponseStatus = 200
AND REGEXP_CONTAINS(ClientRequestURI, r'^(/prefix-01|/prefix-02|...)')
Cloudflare recently introduced analytics in the dashboard and changed rolling 30-day window to natural month window, which makes ongoing monitoring much easier.
The first day’s number is usually high because the system starts counting unique transformations from a cold state. It typically drops over subsequent days because many accesses have already been counted within the current month window.
How we resolved it: zero-downtime rollout details
In this section we describe implementation details that made the rollout operationally safe.
How can we do the rollout from 0% to 100%?
The key idea was to add an abstract path layer that could route to both the legacy provider and Cloudflare Images while keeping client-facing URLs stable.
That abstraction also made the rollout easier to reason about. By standardizing request patterns early, we could focus measurement on the most important traffic and track what remained for migration.
To build the abstract URL path layer, we used Cloudflare URL rewrite rules to map an abstract prefix to provider-specific paths. This let us switch a controlled slice of requests without requiring clients to adopt a new URL format.
rewrites = [
# before migration, legacy image provider
{ prefix = "/abstract-01/", target = "/old-prefix/settings/" },
# after migration, Cloudflare Images
{ prefix = "/abstract-01/", target = "/cdn-cgi/image/settings/" }
]
To avoid a random split, we increased traffic by matching image IDs using a regex-based rollout. This approach made the rollout deterministic, which improved debuggability and reduced the risk of inconsistent customer experience.
path_regex="/\\d+(000[0-9]{1})_\\d+\\.jpg" # 0.1%
path_regex="/\\d+(0[0-9]{3})_\\d+\\.jpg" # 10%
path_regex="/\\d+([0-1][0-9]{3})_\\d+\\.jpg" # 20%
# ...
path_regex="/\\d+([0-7][0-9]{3})_\\d+\\.jpg" # 80%
path_regex="/\\d+([0-8][0-9]{3})_\\d+\\.jpg" # 90%
Cache rebuilding
Cache rebuilding was the main operational risk during rollout. If we rebuilt too quickly, we could overload S3 and create widespread errors that would look like a CDN outage even though the edge configuration was correct.
We used a three-phase rollout. First, we ran a canary at less than 0.1% of traffic to measure cache rebuild behavior and the impact of S3 rate limits, with a focus on non-200 responses.
Next, we gradually increased to 1%, 5%, and 10% while confirming rebuild time, S3 request patterns, and error rates. Once those signals stayed stable, we moved to full rollout and increased by 10% per release until reaching 100%.
Cache purging
Cloudflare Images also uses a different cache purge mechanism. You cannot always purge by the transformed URL. Instead, you purge by prefix using the origin path.
For example, if a resized image is served via /cdn-cgi/images/quality=85/somewhere/imageid, you purge using a prefix that targets /somewhere/imageid on the origin. That means the cache purging system must implement the same mapping.
During our migration, we updated the cache purging system first and only then increased rollout percentage. Since the percentage-based rollout triggers cache rebuilding, it does not rely on per-URL purges during the ramp-up.
Conclusion
We migrated from a legacy image provider to Cloudflare Images by treating dependency discovery as the main work and using traffic estimation to decide when it was safe to take the next step. We avoided the common trap of switching traffic first and learning about breakage later.
If you take one idea from this story, make it this: do not start by switching traffic. Start by learning what you would break, and design your rollout so failures are small, measurable, and reversible.


