* This article is a translation of the Japanese article written on December 20, 2020.
Day 20 of Merpay Advent Calendar 2020 is brought to you by nozaq, VP of Engineering on the Merpay Product Engineering Team.
This year, 2020, was a pretty busy year for us. In addition to my regular work on the Merpay Engineering Team, We also had to plan and implement a system shutdown from the start of the year—to terminate a QR code payment service called Origami Pay. I don’t think anyone has really gone into much detail explaining how a service is terminated, so I thought for this article, it might be interesting to describe what goes into terminating a service with a significant external impact (a payment service).
What we did
A payment service has a lot of stakeholders involved, including the general customers who use it to make payments, the merchants who accept these payments, and the financial institutions and partners that are linked with the system. Due to the fact that this kind of service is used to provide a payment method at stores, suddenly terminating the service could have a significant direct and indirect impact on the business of those stores where the payment service is used. That means we need to proceed very carefully when closing this kind of service.
In planning to shut down this system, there were a few requirements we had to meet.
- Notify people well in advance and terminate the service only after providing sufficient time to prepare, in order to minimize the impact on customers, merchants, and other partners
- Minimize system risks, such as failures and unauthorized use
- Minimize running system costs up until the service is terminated
We estimated from the beginning that this project would take at least six months, so we settled on a plan of breaking it down into multiple phases and shutting down features in stages.
There’s no going back once you’ve made a public announcement or started to shut down features, so we were very careful during the advance planning stage. I’ll discuss each action in more detail below. (I list these as steps here for simplicity, but in reality we settled on our plan after we’d already begun, needing to repeat some tasks when new information came to light).
- Identify features/tasks with an external impact
- Identify legal/compliance requirements
- Create a service termination plan
- Create a system shutdown plan
- Shut down the system in accordance with the plan, and confirm impact
1. Identify features/tasks with an external impact
When shutting a service down in stages, you first break the project down into individual features and tasks. It’s important to consider external impact, possible system risks, and running costs.
In starting to plan the termination of the Origami Pay service, we listed up the features and tasks as follows.
- Payment via barcode
- Payment via QR code
- Payment via credit card
- Payment via linked bank account
- Payment via digital wallet
- Refunds and amount adjustments from merchants
- Payment history confirmation by merchants
- Transfer of proceeds to merchants
- Handling of inquiries from general customers
- Handling of inquiries from merchants
In deciding how granularly to break down features and tasks, we settled on "whether the timing for shutdown differs" as our key consideration. For example, Origami Pay allowed for payment to be made by credit card or by linked bank account. The risk of unauthorized use or fraud differs for both methods, so instead of lumping this together as "payment at merchants" we decided to consider the shutdown schedule separately for each.
2. Identify legal/compliance requirements
Depending on the service, there may be legal and compliance requirements to meet in order to terminate the service.
Origami had registered in accordance with industry laws as a fund transfer business, third-party prepayment-type payment method provider, digital payment proxy, and business contracted to handle credit card numbers. It was also compliant with PCI-DSS.
In closing down each service, we worked closely with the Legal/Compliance Team to maintain our risk management setup and protect consumers. We made sure that we hadn’t overlooked any features and functions identified in Step 1, investigated any points to consider about shutting down each, and identified requirements.
For example, when closing down a fund transfer business, you’re required to have a communication plan (email, website notification, official journal, etc.) in order to ensure that any remaining funds are returned to customers.
3. Create a service termination plan
The next step is to determine a schedule for each of the features and tasks identified in Step 1. This would include important dates such as shutdown dates and advance notice dates.
In addition to satisfying legal/compliance requirements, we worked with the Business/Operations Development Team (to coordinate with external stakeholders) and with the Operations Team and Corporate Team (to coordinate internally), ultimately creating the following nine-month schedule.
Figure 1. Service termination plan
It would be unrealistic to cover this in detail here, so I’ve listed only those milestones with a significant impact on system shutdown below.
Table 1. Important system shutdown milestones
Stop transfer of funds between customers using digital wallets
Stop payment via digital wallet
Stop payment via credit card, stop payment via barcode
Stop services for general users, stop allowing users to download the app
Stop services for merchants, stop allowing merchants to download the app
|Feb. 2020||Stop transfer of funds between customers using digital wallets|
|Mar. 2020||Stop payment via digital wallet|
|Apr. 2020||Stop payment via credit card, stop payment via barcode|
|Jun. 2020||Stop services for general users, stop allowing users to download the app|
|Sep. 2020||Stop services for merchants, stop allowing merchants to download the app|
The timing varies for each payment method. We took into consideration the fact that we wanted to minimize system risks and unauthorized usage risks as soon as possible, and also considered whether it was feasible to close down features quickly to minimize external impact. This is why we decided to stop payment methods in stages. We also decided to set a 90-day buffer period from when new payments would be stopped, prior to terminating the service for merchants. We did this to allow for refunds and amount adjustments that might occur even after general users stopped using the service.
At this time, we also identified other actions to take besides shutting down features and notifying users. For example, due to the risk of phishing scams that might occur once it is publicly announced that the service will be terminated, we included plans to enhance measures to automatically detect and monitor unauthorized logins and transactions.
4. Create a system shutdown plan
The next step was to identify which sub-systems could be shut down during each phase of the service termination plan and to plan a schedule describing which systems would be shut down when.
Origami Pay consisted of a backend system running on AWS and iOS /and Android mobile applications for customers and merchants. Although we could shut down features in real-time on the backend system simply by modifying the web version, the mobile app posed a difficult challenge. Even if we uploaded new versions to the app stores, it would take some time for many of the customers to update. We decided to prepare in advance to change what information the app displays as we actually began shutting down features, by distributing a new version of the app with a feature that enabled us to switch screens and behavior for each shutdown phase, based on flags on the backend.
In addition to shutting down our own system located on AWS, we also planned when to stop using (or when to cancel contracts for) external services connected to the system and cloud services used in these tasks.
In addition to identifying related services that were no longer needed according to the features shut down during each phase, we also needed to ensure that we hadn’t missed any external services that we needed to discontinue. Conveniently, our schedule happened to match up with our annual inventory taking of external services, so we used our external service ledger to create a plan for terminating each service (the following figure is a sample ledger created for Origami, for reference purposes).
Figure 2. Sample external service ledger
Once we decided on our plan for shutting down internal systems and terminating external services, we used this to approximate system costs up to Origami Pay’s termination.
It would be very difficult to completely estimate changes in cost due to structural changes for an AWS environment. However, we can use Billing and Cost Management to analyze and create a detailed breakdown of past billing (by service, size, region, etc.). We used this tool to estimate costs after shutting down systems during each phase.
5. Shut down system in accordance with plan, and confirm impact
The next step was to shut down features based on the schedule we created in Step 4. We had already decided to shut down some features prior to completely terminating the service. In order to prepare for any unexpected impact, we decided to split work tasks up and prepare reversion protocol just in case. We needed to push forward while having multiple people checking the work, and as a result, we found tons of people taking it upon themselves to get involved in the project. It’s thanks to this solid setup we had that we were able to do so much.
After shutting down the service and making changes to the infrastructure, we monitored AWS costs for several days. We would confirm whether daily costs dropped to levels we had estimated previously in our system shutdown plan. If costs hadn’t dropped as estimated, we would then determine why and try to resolve the issue.
As shown below, Cost Explorer can provide a breakdown of services by daily costs, making it easier to analyze causes. Although it’s easy to estimate computing/database costs (such as ECS, EC2, and RDS), it can be difficult to estimate networking/governance costs. We therefore also had to make additional adjustments during each phase in order to meet our cost targets.
Figure 3. Daily cost breakdown using Cost Explorer
The figure below shows the change in costs for the AWS account that was actually running the service. We were able to suspend all features by September and made applicable changes to the infrastructure in October. You can see that costs significantly dropped in November. We were able to reduce costs to our planned levels throughout the process. Proceeding according to our cost plan helped not only keep our financial costs down, but also to reduce the number of resources to manage. It even reduced system risks and the operational load on the people involved.
Figure 4. Final changes in AWS costs
6. (Bonus) Check Twitter for closure
There’s one thing I’d like to mention that really has little to do with the topic of this article. During the payment feature shutdowns, we provided advance notice including details up to the date and time of the shutdown. Searching for posts around that time and date on Twitter, I found users posting their thoughts about the situation. Checking Twitter and other social media as we shut down the system made things seem not so bad. I even saw people tweeting about making final payments just before the service was terminated, as a sort of memorial.
This was the first time I had ever been involved with terminating a service so carefully and over such a long period of time. I learned a lot during the planning and implementation stages. Having said that, terminating a service in operation is not necessarily a fun job, and I’m very thankful for everyone involved in getting this work done.
Terminating a service is not something you encounter very often, but I hope this article will serve as reference for anyone who might end up working on such a project.