2022/03/17

Reconciliation in microservices

Author:: foghost

, 2022/03/17

* This article is a translation of the Japanese article written on December 23, 2021.

This article is for day 23 of Merpay Advent Calendar 2021. It was sent by @foghost, of the Merpay Payment Platform Team.

This article explains risks to be avoided through reconciliation, with regard to payment system development. It then introduces some of the reconciliation practices taken at Merpay.

What is reconciliation?

Searching for the term “reconciliation” brings us to a Wikipedia article that describes it as a risk management technique. Human will always make mistakes. Reconciliation can be used to confirm whether we are getting the expected results of whatever we do, which can help to detect inconsistencies and avoid risk.

It might seem like a vague concept, but in reality, there are many situations that require reconciliation in our daily lives. For example, imagine someone is doing some shopping.

The customer brings two 1,000 yen cups to the register.
The clerk calculates the total and asks for 2,000 yen.
The customer pays 2,000 yen.
The clerk hands over the two cups and a receipt.

We can see reconciliation several times even in this simple example.

When the clerk asks the customer for an amount of money, the customer can reconcile whether the amount is what they expected.
When the customer pays 2,000 yen, the clerk can reconcile whether the amount matches what they requested.
The customer can reconcile whether the cups they received are the cups they chose, and whether the quantity is correct.
The customer might need to return the products, and they can reconcile whether the products and amount on the receipt match what they purchased.

What would happen without reconciliation?

There is the risk of the clerk miscalculating the total and asking the customer for the wrong amount of money.
There is the risk of the customer paying only 1,000 yen and the clerk mistakenly recording payment of 2,000 yen.
There is the risk of the clerk handing over the wrong products to the customer.
If the clerk mistakenly records payment of 1,000 yen on the receipt, there is the risk of trouble occurring if the customer returns the products at a later date.

There is therefore a risk of monetary loss or trouble occurring, for both the clerk and the customer. Why might this happen? Because humans will always make mistakes. Automation using machines or systems is one valid way to prevent the risk of human error in many cases. However, machines and systems are made and operated by humans and can therefore make mistakes. Reconciliation (determining whether final results are as expected) is therefore a very effective approach of mindfully avoiding the risk of human error.

Reconciliation in payment systems

Unlike the example presented earlier, most payment transactions can be processed automatically in a backend system without human interaction. You might think that there would be hardly any need for reconciliation, as long as the system is of sufficient quality. However, the opposite is true. Payment systems are extremely susceptible to risk, and reconciliation plays an important role as a final check to avoid these risks.

Payment system risks, and using reconciliation to avoid them

The following are just some of the risks that must be considered even for payment systems.

System risk
- There is the risk of confirming inaccurate processing results due to an implementation bug or unexpected system incident.
Trust risk
- There is the risk of inaccurately calculating payment for a customer or merchant partner, which would significantly impact users’ trust in the company.
Financial risk
- If figures on financial statements do not match those confirmed on the system, it could cause mistakes in financial decisions, which could significantly impact company management.
Legal risk
- Like other industries, the payment field is also subject to the legal requirements of industry laws. For example, when a funds transfer specialist has received funds from a customer (outstanding debt management), the funds transfer specialist is required to guarantee a performance bond worth at least 100% of funds received from the customer, and it constitutes a legal violation if records required to manage this outstanding debt are not accurate.

In order to avoid these risks, Merpay does a lot of work toward ensuring quality during system development. We also use various methods to reconcile final results in order to more strictly manage risk.

There are two major categories of reconciliation for payment systems.

Reconciliation between internal books and actual financial flows
- With this method, balances and amounts recognized on both accounting and statutory books that are processed within the system are checked against actual financial flows (such as through a bank) for detecting inconsistency.
Reconciliation between system process flows and the final changes on internal books (including external processes)
- With this method, reconciliation is performed to determine whether each payment process on the system was performed properly, or whether the results of processes are correctly recorded in internal books.

Reconciliation between internal books and actual financial flows needs to know the actual financial flow, so various teams in corporate (finance, legal, accounting, etc.) reconcile internal record data, and then build processes to reconcile the internal books with actual balance changes at the bank.

In contrast, during system development, we must ensure reconciliation between system process flows and internal books, which means that system processes effect the result of internal books must be reconciled across all participant services. I’ll mainly focus on this type of reconciliation from the development perspective later in this article.

Finally, in order to confirm that reconciliation is being performed properly, an internal audit team will periodically check system process logs, records on internal books, and financial flows as a part of an internal control and reporting process. In other words, they monitor system processes and reconciliation processes to ensure they are functioning properly.

Reconciliation between system process flows and internal books

Merpay adopts the microservices architecture and uses two main types of reconciliation to verify that each system payment process is accurately recorded on internal books..

Process flow-based results reconciliation is required between all services participating in a process.
Comparing final results between internal books for double checking. For example, we confirm that deposits and balances on accounting book match changes on balance book.

Next is process flow-based reconciliation.

Each microservice is aware of its own dependencies and expected results (balance amount and process status). During reconciliation, we therefore perform a final verification between local results and dependent service results for each microservice, either as a batch or using the result notifications from other services.
If a dependent service is an external service, you can often obtain finalized sales reports from the service vendors.You can therefore also reconcile the details in sales reports with internal process results to perform the final verification.

Handling approaches when a inconsistency is detected during reconciliation

When performing reconciliation, we also need to consider how to handle any inconsistencies that might occur. Reconciliation inconsistencies with dependent services can be categorized into three major types.

Successful process data is available locally, but not in the dependent service.
- If due to a bug, this can be handled by canceling the local process or retry the calls to dependent services..
- This could also happen in a situation when reconciling processes spanning days because time lags between services. In order to resolve this issue, we can perform reconciliation by comparing results with same local or dependent service process time, or reconcile again on the following day if we can’t unify the process time for reconciliation..
Successful process data is not available locally, but is available in the dependent service.
- This occurs when network errors such as a network timeout are not taken into consideration of error handling, and also forgets to cancel the handled process data in dependent service. This could be resolved by canceling the process in dependent service.
- This could also occur when reconciling processes spanning days same as the type 1.
Successful process data is available both locally and on the dependent service, but the amount or other details do not match
- This is often handled on a case-by-case basis. From customer perspective, if the local process data is correct, we can correct the process data in dependent service by canceling the original request and send a correct request instead.

Microservice reconciliation challenges and our practices

Finally, I’d like to introduce a challenge we’ve encountered during microservice reconciliation, as well as our efforts to resolve it.

How to verify overall payment process consistency across multiple microservices

When using a system with a monolithic architecture, we can manage process flows and record data within the same service. That means we can verify overall consistency of a single process, as long as we can reconcile this with an external system. However, this is not the case with a system using a microservices architecture. There would be multiple microservices participating in a single process, and process data management would be distributed. We have following issues when we want to verify the overall consistency across multiple services, even if individual reconciliation is performed.

There is no system or mechanism to enforce all services to perform proper reconciliation. If each service adopts reconciliation processing on its own terms and reconciliation is missed anywhere (as shown above), reconciliation would be incomplete in terms of overall payment process result consistency when we look at the entirety of the system.
Even if reconciliation is performed for all services, there is no “single trustable source” that can be used to verify that a single payment process result was performed correctly across multiple services. In order to confirm overall process result consistency, we need to consult the reconciliation result from each service with API or confirming DB records directly.

Our Solution

We wondered whether it would be possible to resolve these issues while still retaining the unique characteristics of microservices architecture. We are now considering a new system for using in our Payment Platform, as shown below. I’ll cover this in detail in a future article. For now, I’ll just describe it on a conceptual level.

If you’re familiar with distributed request tracing, the concept should be easy to understand. When a single request is processed over multiple microservices, you can visualize the overall processing of the request by reporting required tracing information without knowing the business logic of each individual service.

The concept here is similar. If we can abstract all processes of a payment request across multiple microservices as a single “processing” and report reconciliation results from each service (reconciliation logic requires domain knowledge, so this is left to service side as the responsibility of each service) similar to how distributed tracing is handled, we can build “True of Processing” as a single data source that allows us to verify which services are participating in a particular payment processing and whether overall consistency has been confirmed.

Concepts

Processing
- We refer to processes that cross multiple microservices and require consistency reconciliation as “processing.”
- A ProcessingID is required to identify a single processing.
Participant service
- This is the service participating in a single processing. If a service is reported as participating, a consistency reconciliation results report is required, and the service will be recorded as a participant service.
Consistency report
- A consistency reconciliation results report provided by a service participating in a single processing.

Simplified process flow

The entry service which triggers the payment process issues a ProcessingID, and this ID is propagated when processing even to dependent services required to reconcile results.
- Although we can create a function to issue ProcessingIDs, we can also assign ids locally at the entry service if we want to reduce the number of dependent components for availability and performance reasons.
The entry service reports processing information that includes the ProcessingID and ServiceID.
- At this point, only participant service is the entry service.
Once processing information is registered, we asynchronously run a command to detect processings where consistency reconciliation results has not been reported over a certain period of time, and then send an asynchronous participant notification to all left participant services already registered in the processing.
The service acting as a participant service starts a worker that receives the participant notifications and handles the notifications. The worker need to find associated local records from the ProcessingID and use them to perform reconciliation with dependent services. Once reconciliation is complete, a consistency reconciliation report including the reconciliation results and dependent services is created and collected.
- Any new dependents that are reported are registered as new participant services of a processing.
Steps 3 and 4 are repeated until reports have been received from all participant services. With this, we can confirm overall processing consistency. If any unreported services are found, the system can trigger alerts to help detect and recover, so we can ensure that reconciliation is performed at all times, at least with regard to the issues being discussed above.

Summary

In this article, I discussed how reconciliation is done at Merpay. I hope this will be of use even to the many people outside of the payment field who are encountering similar issues.

Thank you for reading!