This article is part of the Developer Productivity Engineering Camp blog series, brought to you by Yuji Kazama from the CI/CD team.
Hello, I am Yuji Kazama, the Engineering Manager of the Continuous Integration(CI)/Continuous Delivery(CD) team. The CI/CD team is one of the teams in the Developer Productivity Engineering Camp. In this article, I will give a brief introduction of the team by answering the following questions.
- What is the team trying to achieve? (Mission)
- How does the team plan to realize its mission? (Strategy)
Our mission is:
“Offer developers the most productive development experience through CI/CD”
Our team’s primary customers are internal developers in all of Mercari Group. Why does our team exist for the customers? To answer this question, let me explain our background.
When our application was monolithic, as the application grew more complex, more issues were observed. From the CI/CD point of view the test and release processes became harder. The deliveries became slow because we needed to wait for changes in the whole service to be merged, tested, and deployed. Furthermore, if a bad update got deployed, it caused the entire service to go down.
To solve the above issues, we decided to migrate from a monolithic to microservices architecture. After migrating, we could no longer test the way we did in the monolithic era. For example, a microservice needs to call other services via remote network calls instead of local function calls. This means that we have to take care of extra failure points for these tests. Retrying the call to other services may cause a DDoS on them, so we have to take care of providing a proper protection mechanism which should also be tested.
For historical reasons, there were some differences between our development environment and production environment. For example, a service could be hosted in a different data center between development and production environments. The different infrastructure and the different provisioning made testing harder.
Last year, we had a large security incident caused by a supply chain problem. Any software can introduce vulnerabilities into a supply chain. As a system gets more complex, it’s critical to have checks and best practices in place to guarantee artifact integrity.
To solve these issues, we need to rely on a CI/CD platform. Our team exists to provide this, in order to achieve high release velocity and high reliability. We can minimize risk and make testing and deploying easier for the internal developers.
Strategy is how the team plans to realize its mission. The following are the strategies that our team follows:
- Making data-driven decisions
- Providing useful tools and infrastructure
- Fostering healthy testing culture
Making data-driven decisions
A well-designed CI/CD pipeline allows development teams to have fast feedback loops and release software with confidence. We can’t improve on what we don’t measure. We must define measurable metrics so that we can make concrete strides towards a better product.
In the team, we defined the high level CI/CD metrics in order to let us know where there are issues that have a negative impact on development productivity.
However, monitoring is not enough to maintain productivity. We need alerts— automated communications sent to developers and us— about results of our monitoring that require action. Also, regular reviews of current processes and outputs should be done to make sure their usage continues to be effective.
Providing useful tools and infrastructure
We provide tools used in the development process through the CI/CD because they are at the heart of our efforts to realize our mission. We need to consider how easy it is to use the tools and how useful the tools are in accomplishing internal developers’ goals.
Test infrastructure is a critical component for high productivity software development. It should be stable and easy to use in accordance with various requirements from developers and QA engineers. Delivery (Deployment) infrastructure is also critical for high productivity software development. It should be stable, easy to use, and also secure by default so that developers can set up and use their deployment pipelines on top of it without hassle.
Fostering healthy testing culture
Our fundamental goal is to automatically catch problematic changes as early as possible in order to release products frequently. However, we cannot run all tests on every commit. It’s too expensive for developers to be blocked on every commit by failures arising from test instability or flakiness that has nothing to do with their code changes. We need to define which tests to run in our development and production workflows. The types of testing we need to care for including the following:
- Functional testing
- Security testing
- Performance, load and stress testing
- Deployment configuration testing
- Canary testing
- Chaos engineering
The more frequently and faster we need to make changes to our services, the more we need a fast way to test them. A healthy automated testing culture encourages developers to share the work of writing tests. Such a culture also ensures that tests are run regularly. An emphasis on fixing broken tests quickly maintains high confidence in the process and to do so, we need to promote automation testing and best practice as evangelists.
In this article, I gave a brief introduction of the team. We are going to publish some articles about the team’s activities. Look forward to them!
Mercari is looking for engineers. The platform described above has not been fully actualized yet, but if you would like to work with us, in a company that aims to create such a platform, please apply for a position that interests you!