This article was written as "Series: The Present and Future of the RFS project for strengthening the Technical Infrastructure” and a translation of the Japanese article published on January 12th, 2023.
I’m @AHA_oretama from Mercari JP’s CS Tool team.
Our team has been developing the CS Tool (Customer Service Tool) for many years. In order to eliminate the debt accumulated in the process, we launched a project to replace the frontend, and have been working on replacing one particular screen in a period of about three months as a PoC (Proof of Concept). In this article, I would like to introduce the origins of the CS Tool frontend replacement project, and the development of the replacement as well as its results so far.
If you are interested, please take a look at the following articles about the CS Tool team’s other initiatives (articles are in Japanese).
- “Efforts toward loosely coupled DB in the Mercari CS tool" by @hukurou
- “How we solved the system problems we discovered during the GKE migration" by @monkukui
Issues to be solved in the existing CS Tool
Technical Debt in the CS Tool Frontend
Development of the CS Tool began before the launch of the Mercari Application, and many functionalities were added over time. On the other hand, there was a long period of time when there were few engineers in charge of the CS Tool, and not much maintenance and improvement was done. As a result, the CS Tool accumulated a lot of technical debt and became very difficult to maintain, especially the frontend.
The main liabilities was that there were multiple different frameworks used in the frontend depending on when it was developed, and they were all mixed together: Twig, a PHP template engine; tupai.js, an OSS frontend framework; and React, which was introduced several years ago as a part of previous replacement process. The mix of these three has increased the amount and breadth of knowledge required to work on the codebase, making modifications more difficult.
The use of tupai.js is especially painful, as the OSS itself has not been maintained for many years, making it not only impossible to update the surrounding libraries, but also to install new libraries (e.g. TypeScript, Testing Library, etc.). In addition, there is not much documentation or knowledge available on the library, making it even harder for new members to work on screens that use tupai.js.
The poor local development experience is also a major liability: the CS Tool frontend and backend are very tightly coupled, and the frontend cannot be run unless the backend is started. The backend also depends on several other services and cannot be started locally. Therefore, frontend modifications are uploaded to a shared development server, which significantly reduces the Developer Experience (DX).
One final and major liability is that we cannot write tests. As mentioned earlier, tupai.js is currently not maintained, so the Testing Library (React Testing Library in the case of React) cannot be installed. This makes it very difficult to write tests for the tupai.js code. Also, the React component, which was partially replaced a few years ago, can work with the React Testing Library, but the React component in the CS Tool itself was not designed to be testable, so it is also difficult to write tests for this component.
Unlifted CodeFreeze and problems with Microservice Migrations
One of the major issues other than technical debt is the continuing CodeFreeze. In recent years, Mercari has been promoting the company-wide shift to microservices, and the CS Tool has also been converted to microservices in the same way. If we work on microservice migration while more functionalities are added or existing functionalities are being modified, the scope of microservices will continue to increase as modifications are made to the existing CS Tool, and there is even a possibility that the microservices will never be completed. In order to avoid such a situation, we have decided to prohibit modification of all code in the CS Tool in principle at the start of the microservice conversion. We call this CodeFreeze.
We have been in CodeFreeze for several years now, and during that time we have been working on microservices, but at present we have only microserviced a few functionalities out of the dozens that already exist in the CS Tool. If we continue with the microservice migration this way, it is expected to take 10 years, during which time it is unrealistic to continue with CodeFreeze.
In the past, the CS Tool microservice migration was done by carving out the domain into a single application and redefining the screens and specifications from scratch. However, there were several problems with this approach.
- Redefining specifications incur significant development costs
- This workflow required changes in customer service operations using the CS Tool, and the transition would have been very costly.
- Customer service personnel would have to work with multiple applications that have been carved out on its own.
- By definition, the most of microservice migration is concerned with the backend, but our workflow involved both frontend and backend, which necessarily increased the scale of the problem.
As you can see, this approach was one of the major factors that prolonged the Microservice migration.
The Choice of Frontend Replacement
As we have described, we still faced some major problems. At the end we decided to tackle this by replacing the whole frontend. We will explain some of the reasons behind this decision.
One is that all of the technical liabilities listed above could be eliminated by replacement. Most importantly, three mixed frontend frameworks can be unified into one modern framework, which in turn should bring improvements on reducing learning hurdles, DX, and testability.
The second is the establishment of new CodeFreeze release criteria. Up to this point we required ourselves to finish the microservices migration before lifting the CodeFreeze. Since we expected this to take around 10 years, it was no longer realistic, and thus we needed a new set of requirements. We decided to lift the code freeze upon the completion of the frontend replacement, which we expected to finish in a relatively short amount of time.
Third, the replacement of the frontend would be a way to address most of the problems of the Microservice migration of the CS Tool to date.
- The replacement effort would maintain most of the previous specifications and designs, and thus development costs would be relatively low
- Customer service personnel would not need work with multiple, separate applications
- We would be able to focus on work on the backend once the frontend replacement was done.
And last but not least, we also found that in general costs to replace frontend components were lower than expected.The GroundUp Web project replaced Mercari’s Web service over a year of development, as described in "The Four-Year history to migrate Mercari Web to Microservices,” and this showed us how long a large scale frontend replacement would take. The CS Tool’s frontend code footprint was only about 1/10 of the code that GroundUp Web replaced. Although this is not an easy estimate due to differences in staffing and original functionality, we could see the possibility of replacing it in less than six months if we were only looking at replacing a codebase that was 1/10 in size.
How I launched the Project
This section describes the origins of the CS Tool frontend replacement project.
I joined the CS Tool team last year, and after an onboarding period of about 1-2 months, I began to develop as part of the team in earnest. During the six months or so of development, I began to feel the development experience and quality was poor. It was easy to see that the cause of this was due to the liabilities mentioned above. From there, I spent my spare time searching for ways to alleviate or resolve the debt issues. After some research, I came to the conclusion that replacing the frontend was a realistic solution, so I compiled my findings and suggestions for replacement into a Design Doc.
The Design Doc was first shared with the team, and then Camp (the team’s upper organization), and we were able to get general support for the replacement proposal. I think that the fact that we explicitly mentioned the short-term goals of the projects (allowing us to forecast how much actual work would be necessary) was one of the factors that made it easier to gain support for the proposal. The short-term goal was to replace one screen in about three months as a Proof of Concept (PoC), and based on the results of the PoC, we decided to determine whether to expand the replacement to all screens in the future. I also feel that Mercari’s "Go Bold" culture was a catalyst to encourage this kind of proposal to go forward.
This is how the project began, and to date we have been building the infrastructure (implementing the Strangler pattern described below) and replacing one screen in a period of 3 months, give or take, as a PoC. At the time of writing this blog, the actual development work is almost complete, with only a release and PoC review to be done.
Actual Development process
We started out by deploying a proxy server in front of the existing CS Tool service so that traffic to the existing CS Tool service could be maintained and only selected requests could be proxied to the frontend service that was being replaced. This is known as the [Strangler Pattern], a technique for gradually replacing legacy services with new services.
admin-gateway: Service to manage access to Mercari’s internal services such as CS Tool.
CS Tool: Existing CS Tool, providing frontend and back-end.
new proxy: Proxy server newly introduced this time
new frontend: Frontend after replacement
The replacement frontend service is a Single Page Application (SPA) and delivers assets from a pod on Kubernetes (K8s). The new frontend service was deployed on k8s in order to restrict access such that it only accepted traffic from admin-gateway. We also had prior experience implementing a similar architecture.
These are the main tools that we used in frontend development.
- Tailwind CSS
I will discuss some of the results and impressions of using these in the next section.
Effects after the replacement
As originally intended, the replacement has yielded significant improvements in the first two areas of liability. In particular, the improvements on development productivity and quality were significant.
In terms of development productivity, we have introduced two mechanisms that allow the frontend to be launched in a local environment, enabling local development and dramatically increasing development speed. One of these mechanisms is to connect to the API server of a shared development environment and launch the frontend in a local environment, although some preparations (cookie settings, VPN settings, etc.) are required before launching the frontend. The other is to use a Mock Service Worker to communicate with the mocked API. The advantage of the latter is that everything is done locally, so there is no preparation or connection to the outside world.
In terms of quality, CI was introduced from the early stages of development, and a system was established to ensure quality by running unit tests and linting. Although most of the team members did not have much knowledge of frontend testing, they made sure to write unit tests, which was very important.
In addition, the introduction of TypeScript had several positive effects.It clarified the fields of the API actually used in the frontend, which is something that we can use to improve the back end API in the future. It reduced the number of requests and improved performance by introducing SWR. It allowed us to deepen our understanding of the specifications of the screens being developed through the development.
The most difficult part of this project was our own lack of knowledge and experience in frontend development. Since most of the team members had no experience in frontend development, it was very difficult to catch up on that part, and it took longer than expected. To address this problem, the team held 30 minute study sessions twice a week to share knowledge and experiences. It started with a study session on Tailwind CSS, and we plan to continue these sessions, covering testing and React in the near future.
The CS Tool has been in development since before the launch of the Mercari application, but it has accumulated a lot of debt, especially in the frontend. In response, we proposed replacing the frontend and have replaced one screen as a PoC. The development of the new frontend is much better than the current CS Tool, and has greatly improved development productivity and quality.
At the time of this writing, our only task that is left is to release the results as a PoC, and perform a review/look back on our work. We intend to review the PoC, and based on our findings will correct our schedule and expectations in preparation to replace the entire CS Tool frontend. We do not yet know how far we will go in replacing the CSTool frontend or how many months it will take, but we hope to share the status of the frontend replacement through blogs such as this one.
Currently, Mercari is looking for colleagues to work with us on a company-wide cross-functional project called "Robust Foundation for Speed" to solve the technical challenges of Mercari’s common business infrastructure! If you are interested, please see the following link.
Software Engineer, Backend Foundation (PHP/MySQL) – Mercari
Software Engineer – Mercari