Threat Modeling at Mercari

Hello, this is @gloria from the Product Security Team. Previously, I wrote about security testing as a part of last year’s Advent Calendar event.

This time, as a part of our Security Tech Blog series, I would like to introduce threat modeling, a fundamental part of every secure software development lifecycle (SSDLC). I will explain the basic process, why it’s important, and how we do it here at Mercari.

What is threat modeling?

Threat modeling is the process of analyzing a system to find all potential vulnerabilities that can stem from insecure design. In other words, it involves deeply analyzing the design of systems and applications to find vulnerabilities before they are even implemented. By doing so, we can preemptively prevent attackers from gaining a foothold on our systems, making this a form of proactive security.

Proactive security means that instead of simply responding to attacks after they have occurred, we are actively working to prevent attacks from occurring in the first place. Proactive security is always better than reactive security because it will save us from all the work and money spent on handling a security incident that could have been prevented. According to a 2021 study conducted by IBM, an average data breach costs about US$4.24 million so it is in our best interest to prevent security incidents from occurring by acting proactively and preemptively.

In addition to preventing vulnerabilities from being introduced into the system, threat modeling also serves as a way to uncover weaknesses that currently exist in the system. We can reduce overall risk by analyzing issues that are uncovered, determining how to handle them, and assigning priority to fix them as necessary.

Why should we do threat modeling?

Threat modeling has several benefits to both systems and software applications, and to the teams that perform it.

Systems and software that have gone through threat modeling tend to resist security threats and attacks far better than systems that aren’t threat modeled. This is because threat modeling effectively reduces the attack surface by undercovering design flaws and decreasing system complexity, while also identifying single points of failure and other critical areas that require additional security consideration.

Since it is a highly-collaborative team activity, it also provides a place for deep discussion of system design and features. Through these discussions, teams may uncover previously unknown technical debt or even unknown features. Additionally, it nurtures a security mindset in all team members that participate because threat modeling requires them to think of their own features from the viewpoint of a potential attacker.

Overall, threat modeling leads to a resilient product that is less prone to having incidents, allows us to find areas of our applications and systems that are potentially vulnerable, and helps create a culture of security awareness among development teams so it should be a part of any secure software development lifecycle.

Our threat modeling process

At Mercari, we began introducing threat modeling into our development processes over a year ago. Our process involves five steps:

Step 1: Building the model

Before we can begin threat modeling, we must first understand the system. In order to understand the system, we need to identify the following things:

  • Assets: the individual components in the system (microservices, applications, critical components, databases, etc.)
  • Data flow: the movement of data within the system
  • Trust boundaries: places where the level of trust changes
  • Entry points: places where external input enters the system (user input, input from other internal systems, etc.)
  • Privileged actions: admin user flows, etc.

We gather this information from anywhere that we can— documentation of API endpoints (sample requests and responses), design documents, feature specifications, etc.

We also collect information from two common types of models that are used at Mercari:

  • Data flow diagrams: Focus on visualizing trust boundaries and the flow of data between different components of the system
  • Sequence diagrams: Focus on the sequence of actions that occur when using the system

Sample data flow diagram (left) and sequence diagram (right). Image Source: Information Security University of Florida and Stackoverflow

After piecing together all of the information we have gathered about the system, our model looks something like this.

A sample threat modeling diagram at Mercari

Step 2: Brainstorming

After building the model, we put on our hacker hats and brainstorm everything that can go wrong! This step is often the most difficult for development team members due to their lack of experience in the security field so at Mercari, we use two different frameworks to help us brainstorm.

The first one is Microsoft’s STRIDE framework. It is commonly used for threat modeling and we chose it because it is simple and easy to use. It evaluates systems for potential threats based on six categories:

  • Spoofing: Can an attacker falsify an identity to gain additional access?
  • Tampering: Can an attacker modify data as it flows through the application?
  • Repudiation: If an attacker does something and then denies they did it, can we prove it?
  • Information disclosure: Can an attacker gain access to data that they shouldn’t be able to access?
  • Denial of service: Can an attacker crash or bring down the system?
  • Elevation of privilege: Can an attacker assume the identity of a privileged user?

The second framework that we use is the 5Ws + 1H. While this is not actually a threat modeling framework, we found that this technique helps engineers to question various aspects of their system and see things from a different perspective. It involves asking questions about their system based on six categories:

  • Who is sending the data? Who can access the data? (This allows us to identify external/third-parties involved)
  • What data is being transmitted or stored? (Does it contain sensitive information?)
  • Where is the data being stored? Where is it being used? (Is it secure?)
  • When is the data transmitted or stored? When will it be used? (Is authentication and authorization required?)
  • Why is the data being transmitted or stored? (Only transmit and store data that is necessary)
  • How is the data being transmitted or stored? (Is the connection encrypted?)

A common way of brainstorming involves whiteboard discussions as pictured below. Since the beginning of the pandemic, Mercari employees have been working fully remote so we use Google’s Jamboard instead when brainstorming.

Traditional brainstorming using a whiteboard. Image Source: MartinFowler.com

Jamming our brains onto Google’s Jamboard!

Step 3: Identifying valid issues

After brainstorming all potential issues, we discuss as a group to identify issues that are unmitigated and therefore, are of valid concern. During this step, we often refer to the code or ask senior engineers to testify whether the potential issue can actually occur.

Step 4: Performing risk analysis

After valid issues have been identified, we perform risk analysis on each issue to determine their priority and how they will be handled. There are many frameworks for performing risk analysis but we like to use Microsoft’s DREAD framework because it is easy to use.

  • Damage: How bad would an attack be if it occurred?
  • Reproducibility: How easy is it to reproduce the attack?
  • Exploitability: How much work is it to launch the attack?
  • Affected Users: How many people will be impacted by a single attack?
  • Discoverability: How easy is it to discover the vulnerability?

As a group, we discuss each category and assign a rank: High (3), Medium (2), or Low (1). Using these values and the following formula, we then calculate a risk score for each issue:

  • Risk Score = Probability of Occurrence x Impact on Business

Where:

  • Probability = Reproducibility + Exploitability + Discoverability
  • Impact = Damage + Affected Users

Finally, using the risk score we have calculated, we assign a priority of Low (Risk Score 0-20), Medium (21-40), or High (41-60). This is then used to make our final decision on how to handle the issue.

To provide a concrete example, the risk analysis for a potential attack that is easy to launch but only affects a single user each time (such as phishing) may look like the following:

  • Damage: 2 / Medium (Since it only affects a single user each time, damage is fairly limited)
  • Reproducibility: 3 / Easy (The attack is easy to launch once the method is known)
  • Exploitability: 3 / Easy (The exploit is easy to develop)
  • Affected Users: 1 / Low (Only a single user is affected)
  • Discoverability: 3 / Easy (The vulnerability is easy to find)

Therefore:

  • Probability = 3 + 3 + 3 = 9
  • Impact = 2 + 1 = 3
  • Risk Score = 12 / Low

Step 5: Handling issues

Issues that are identified don’t always need to be fixed. There are four ways in which we can handle identified risks:

  • Risk mitigation: We can reduce or eliminate risk by fixing issues, making changes to the design of our system or feature, etc.
  • Risk transfer: We can transfer risk to a third-party through buying insurance, etc.
  • Risk avoidance: We can eliminate the risk by completely removing the component that introduces the risk.
  • Risk acceptance: When the cost of any of the above methods outweighs the potential damage from the issue, then it is better to do nothing and accept the risk. This is commonly applied for issues of low priority that are difficult to mitigate, such as the phishing issue mentioned as an example above.

Rinse and repeat

It should be noted that threat modeling is not a one-time activity. As new features are added and the product continues to evolve, it is natural that new security concerns and potential attack vectors will emerge. Threat modeling should be done on a regular, on-going basis as a part of the design phase of the SSDLC.

Image Source: Big Water Consulting

Conclusion

Rather than taking a reactive approach to security incidents, threat modeling allows us to take a proactive and preventative approach by ensuring that our applications and systems are designed with security in mind. In addition to directly improving the security of our products, threat modeling also has the side effects of encouraging deep, constructive discussion and debate on systems. Sometimes this can even uncover previously unknown technical debt! It also helps instill a sense of security awareness into the people creating, implementing, and managing our systems and applications. Although it may take some time to master the art of threat modeling, its benefits no doubt outweigh the costs and it should be an integral part of any secure software development lifecycle.

The Product Security Team is looking for talented engineers to join our team! If you are interested in working with us, please see the job posting here.