2024/12/15

LLMs at Work: Outsourcing Vendor Assessment Toil to AI

Author:: simon; danny

, 2024/12/15

LLMs at Work: Outsourcing Vendor Assessment Toil to AI

This post is for the December 15th installment of Mercari’s Advent Calendar 2024, brought to you by Daniel Wray (Security Management), Simon Giroux (Security Engineering). Banner illustration: Dall-E 3

TL;DR

As Mercari scales, its Security Management Team faces increasing demands for third-party service evaluations. Traditional vendor reviews rely on cumbersome, manual processes (a.k.a toil), which often involves lengthy questionnaires. To streamline this, Mercari is experimenting with employing code and Large Language Models (LLMs) to automate the information-gathering phase, significantly reducing review time. By extracting and analyzing publicly available data, the AI assisted solution provides faster, more consistent assessments while minimizing manual intervention. This approach enhances efficiency, allowing security teams to focus on managing actual risks rather than administrative tasks.

Introduction

Why are we doing these checks in the first place?

Question: Why do companies conduct reviews before authorizing the use of new third party services (i.e. cloud services such as SaaS)?

How this question should be answered, and how deep such checks go, will often depend on the compliance requirements and risk appetite of the organization, but ultimately it boils down to the idea of gaining a sufficient level of confidence, or trust, in the security posture of the external service or vendor, and documenting evidence of the checks performed to reach this conclusion.

Efforts to establish that trust can often explode into a long list of bureaucratic processes, and seemingly endless spreadsheets of compliance checkboxes to tick, in an attempt to ensure consistent and auditable criteria.

Question: Why is it important to establish that trust?

There are very few, if any, companies who can build out all the tooling they need internally; reaching out for external assistance with some part of the business will always be necessary, and doing so involves the need to trust a third party with that work. When businesses work with outside partners, or use external processors or service providers, they gain much-needed support, but also face risks inherent in the use of that specific service.

By the nature of using an external service, whatever internal information the service might handle, such as internal communications, intellectual property, or user data, ends up being stored or processed on someone else’s servers, which opens the door to the potential risk of data leaks from those servers. Moreover, when integrating these external services with other company systems, there’s another layer of risk—if the vendor’s systems are compromised or if a malicious insider is at play, it could lead to a breach that impacts the company’s data and systems beyond the scope of however the external service is being used.

This is where security teams may start to get nervous about third party and supply chain risk.

The Security Management Team at Mercari, which is in charge of reviewing applications to use external services, receives a significant number of such requests per year. As the company continues to grow, this number is sure to increase.

As a team we want to encourage other teams’ innovation and experimentation into new tools and technologies that could improve employee productivity, provide new insights or improve our application’s user experience. However, at the same time we need to balance this against the challenges and risks involved in managing tool sprawl, and find ways to make our security checks scale to these number of requests.

What might this check process look like?

Coming back to the original issue: To consult on the risk associated with onboarding a new external service, and seek approval for implementing it, teams who want to use a service will check with the Security Management Team. The Security Management Team wants to understand the service and evaluate the extent of the risks the tool could entail based on the functionality, use-case, information handled, and how it connects to our environment, and so on.

The assessment process for a new external service might look like this:

Ask for the name of the service
Ask for links to some documentation about the service
Ask the applicant to describe what the service will be used for (i.e. what problem will it solve?)
Ask the applicant to describe what kind of data will be stored or processed by the service
Ask who will be the owner of the service if it is approved and onboarded

Then the Security Management Team would take that information and begin an investigation. The goal is to see if we can trust this external service and its vendor.

Could using this service expose our infrastructure or data to unreasonable risks?
Are the vendor’s security controls sufficient for us to trust them to keep our data safe?
Do the vendor’s controls meet security standards and compliance requirements for the data that they may be responsible for processing for us?
Are there any other potential security risks inherent in the use of this service, or its vendor?
And so on…

While the Security Management Team leads the review process with a focus on information security risk, other teams such as the Privacy Office and Product Security Team may also be involved in the review and approval process depending on the nature of the service, the data it will handle, and how the applicant intends to use it.

Below is a high-level representation of what our process used to look like. While there were numerous issues with this process, including the number of times we had to reach out to the applicant, one of the key issues was the amount of information we had to search for manually on the Internet.

Image 1: Simplified representation of a manually executed vendor assessment process

Legacy and emergent risk assessment tools

The traditional way of conducting an evaluation like this would be to take a spreadsheet with a few pages of questions, send it to the vendor, ask them to fill it in, evaluate their answers, then approve or reject the use of the service—depending on the risks identified, one’s level of risk tolerance, and the necessity of the service. With the back and forth involved in answering and clarifying questions, this process can become quite heavy and take a significant amount of time to complete.

Recently, Trust Centers are emerging as a more modern way to move away from this questionnaire-based approach, and are becoming more common at European and American companies. These pages publicly list compliance standards, laws and regulations that a company claims to follow, often alongside details of their security and privacy controls. An interested party can then request evidence of this compliance directly from the portal (such as certifications or audit reports) and confirm for themselves that the vendor is doing what they are claiming.

Despite the growth in popularity of Trust Centers, they are yet to be universally adopted (even Mercari is yet to publish our own). Without a Trust Center to review, sending the vendor a questionnaire remains the best approach. Even when there is a Trust Center, a company might still choose to send a questionnaire, as it allows the company to ask their own custom set of questions based on their specific risk appetite and points of concern, and may be necessary in order to meet certain regulatory requirements which ask for answers to questions that a Trust Center may not cover. To help vendors answer these questionnaires, some modern governance, risk, and compliance (GRC) tool providers offer AI-assisted functionalities to handle incoming questionnaires. Questions are automatically answered based on a knowledge base of previously-given answers and documentation, with the help of Large Language Models (assuming that the spreadsheet isn’t formatted too artistically for the tool to understand). A requester that also uses a similar GRC tool could then automatically review the answers against their internal questionnaire, and highlight any points that might be missing. These functionalities streamline the process of checking boxes, identifying findings, asking stakeholders to handle them, and finally authorizing (or refusing) the use of a new external service.

GRC Engineering is slowly establishing itself as the obvious next level of evolution. Bringing Agile, DevOps, CI/CD and paved roads in GRC practices should help security teams to better scale with their company. This means having assessments and controls as part of the development process, and providing guidance as early as possible, not just before the release. A precursor idea to this was partially implemented in Google’s Vendor Security Assessment Questionnaires (VSAQ). The questionnaire is in JSON format, allowing the interface to dynamically adapt itself based on the answers, and provide just-in-time guidance when the answer given is already known to be insufficient. This questionnaire also makes it readable by code, removing some of the need to manually interpret answers.

Leveraging LLMs to assess vendors

Sending questionnaires back and forth consumes a lot of time from everyone and can significantly delay the implementation of a service if the check criteria is not clear.

What if we could reduce some of the pain of doing third party risk reviews this way, by creating clearer criteria to highlight the specific areas that a reviewer should focus on, while enabling the auto-collection and analysis of information and evidence on the specific security control requirements we care about?

Internally, we identified a large number of vendors for which, based on the inherent risk of their service, a more lightweight semi-automated approach could be appropriate. For these, the Security Management Team decided to leverage code and Large Language Models to enable us to move fast, and evaluate using clearer and more codified criteria against publicly available information from the vendor, while still appropriately managing risk and maintaining a reasonable level of confidence and trust in the vendor.

Many mature business to business (B2B) vendors already extensively publicize their security practices, which laws and regulations they are subject to, and which compliance standards they have been certified on. Vendors are already openly signaling what level of security and compliance maturity we should be expecting from them. We just have to find a way to read, interpret and understand the endless pages of legalese and jargon in their Privacy Policies, Terms of Service, certificates, White Papers, and Trust Centers.

If successful, this approach could allow us to reduce the need for more time and resource-intensive manual reviews where sufficient information was already publicly available. It would also allow us to focus on those where information could not be obtained, services with a higher inherent risk (e.g. those involving significant system integration or access to large amounts of highly sensitive information), and those requiring additional custom questions or checks for regulatory compliance.

Mercari took inspiration from these emergent approaches, while trying to find a balance that makes sense for us to ensure faster and more efficient review of external services.

Third party website review as code

To be able to learn about the service and its vendor, the risk assessment process requires the analyst to read about the product, understand what it will do, and what information it will store or process. This traditionally involves a lot of searching the internet and reading web pages.

To make this information-gathering easier, the Security Management Team collaborated with the Security Engineering Team, who leveraged open source frameworks, Google’s powerful search engine, and Large Language Models to create a solution.

Supplemented with this automation, the new review process looks like this:

Image 2: Simplified representation of the vendor assessment process for external services

In particular, the introduction of LLMs to this stack is what makes this approach possible. LLMs (we use OpenAI’s GPT-4o in this case, but models that can call tools like Google’s Gemini or Anthropic’s Claude would work too) can read any documentation given to them and provide short answers to any question we might ask.

The challenge is that our review process involves a lot of questions, and follow-up questions based on the answers to these questions, and so on. We can’t simply write a long prompt and hope that the LLM’s answers will tell us everything we want to know and be grounded in reality.

One approach is to use Retrieval Augmented Generation (RAG) to feed documents to a LLM, then ask questions and get answers based specifically on those documents. This is the approach we have taken at Mercari, as it enables us to focus the LLM’s attention on documentation we know is relevant, and reduces the likelihood of both hallucinations and answers based on irrelevant information.

Below is a simplified overview of our approach, which aims to gather the necessary information while minimizing the time and effort required by the applicant, the reviewers, and the vendor.

Image 3: Simplified representation of the role of LLM-powered information gathering in the review process for vendors

It’s time to get hands-on and demonstrate how we can use this automation. For the purposes of this article, we will demonstrate using a fictitious service “PayQuick Cloud Pro”, provided by the fictitious vendor “PaySmooth Solutions”.

The Python code below demonstrates the basic concepts implemented in our AI Agent. First, we take note of the current time. The last code executed in this demonstration contains the total execution time.

import time
start = time.time()

Setting details about the external service and vendor

from llm_code import Profile

profile = Profile(
        **{
            "company": "PaySmooth Solutions", # Enter the company name here
            "product": "PayQuick Cloud Pro", # Enter the product name here
            "url": "https://www.paysmooth.com/payquick", # Enter the product's URL here
        }
    )

Customizing questions

The questions themselves are defined as a function in a Python library. The script sends the ‘profile’ of the external service as a parameter, and a custom questionnaire comes out. This allows us to better control the flow and ask follow-up questions dynamically based on answers received.

Here are some examples of questions for demonstration.

from questions_code import prepare_questions
from IPython.display import Image, display, Markdown

questions = prepare_questions(profile)
for i, question in enumerate(questions):
    if i > 2:
        break
    display(Markdown(f"## Question {i+1}: {question.get('label', 'General')}"))
    for key in question.keys():
        display(Markdown(f"**({key})**n{question[key]}n"))

Question 1: General

(goal) The team performing the assessment isn't necessarily aware of what this service is doing. This question will tell them what the product is supposed to do, how it is supposed to be used, and what kind of data it is supposed to process.
(main) What is the purpose of ‘PayQuick Cloud Pro’ by PaySmooth Solutions? Which problem is it promising to solve? Why would a customer consider using it?
(expected) A brief description

Question 2: General
(goal) A service can be used by different types of users, such as administrators, end-users, or developers. This question will help the team understand who is the target market, operators, and users of the service.
(main) Who is the target market, operators and users of PaySmooth Solutions PayQuick Cloud Pro?
(expected) A brief description

Question 3: General
(goal) The team needs to understand the key features of the service to assess the risks associated with it. This question will help the team understand what the service is supposed to do.
(main) What are the key features of PaySmooth Solutions PayQuick Cloud Pro?
(expected) A list of features

Using Langgraph to configure an AI agent

The Langgraph library provides a nice framework to control the execution flow of an AI agent. This agent can then use tools to perform some of the tasks and use a LLM to produce the final response to a question.

As described by the graph below, the agent:

receives the question from the script,
decides if it needs to use Google Search to find relevant documents,
gives back the content recovered to the LLM to decide what to do with it,
will search the internet again if content isn’t good, or will give up if there were too many attempts,
asks the LLM to answer the question.

from llm_code import build_graph
from langchain_core.runnables.graph import CurveStyle, MermaidDrawMethod, NodeStyles

graph = build_graph()
display(
    Image(
       graph.get_graph().draw_mermaid_png(
            draw_method=MermaidDrawMethod.API,
        )
    )
)

Image 4: Visual representation of the agent’s workflow

Asking the agent to answer each question

With the agent defined, we can then pass all our questions and ask it to search for answers.

from llm_code import perform_assessment
answers = perform_assessment(questions, profile, graph)

Searching the internet for answers about PaySmooth Solutions - PayQuick Cloud Pro
* Q. What is the purpose of ‘PayQuick Cloud Pro’ by PaySmooth Solutions? Which problem is it promising to solve? Why would a customer consider using it?
* Q. Who is the target market, operators and users of PaySmooth Solutions PayQuick Cloud Pro?
* Q. What are the key features of PaySmooth Solutions PayQuick Cloud Pro?
    ! truncated to 7456 tokens
* Q. What category of product is PaySmooth Solutions PayQuick Cloud Pro in?
* Q. What is the list of companies or customers who are using PaySmooth Solutions PayQuick Cloud Pro?
* Q. According to the Trust Center page, or the official site, what laws and regulations is PaySmooth Solutions PayQuick Cloud Pro compliant with?
    ! truncated to 9832 tokens
    ! truncated to 9832 tokens
* Q. According to the Trust Center page, or the official site, what compliance standards is PaySmooth Solutions PayQuick Cloud Pro following?
* Q. According to the Trust Center page, or the official site, what security standards is PaySmooth Solutions PayQuick Cloud Pro compliant with?
    ! truncated to 7334 tokens
    ! truncated to 7334 tokens

After asking all questions and follow-up questions, answers are returned in JSON format, which allows us to easily manipulate them.

Producing the report

With the answers collected, we can ask the LLM to produce an executive summary and a detailed report.

from llm_code import ask_llm
from prompt_code import make_summary_prompt
from reporting_code import summary_markdown, report_markdown

summary_prompt = make_summary_prompt(answers, profile)
summary = ask_llm(summary_prompt)
report = report_markdown(answers, profile)

display(Markdown(summary_markdown(summary, profile)))

Executive Summary Report
- Company: PaySmooth Solutions
- Product: PayQuick Cloud Pro
- URL: `https://www.paysmooth.com/`
- Date: 2024-11-10

Goal of the product, why are we deploying it, how will it help us solve issues we are facing?
- Goal: Streamline payment processes, enhance security, and support business growth.
- Deployment Reason: Manage multiple payment methods efficiently.
- Solution: Secure transaction processing and financial services support.

What are the laws and regulations that this product is compliant with?
- Specific laws and regulations are not clearly listed, but it complies with ISO27001, PCI DSS, and Privacy Mark.

What are the compliance standards that this product is compliant with?
- ISO/IEC 27001: Information security management.
- PCI DSS: Credit card industry security standard.
- Privacy Mark: Personal information protection standard in Japan.

What are the security standards that the company is following?
- ISO/IEC 27001
- PCI DSS
- Privacy Mark

What kind of data this service is meant to process or store?
- Payment data, including credit card information, digital wallet transactions, and bank transfers.

Are there risks that were highlighted that the Risk and Security team should be made aware of?
- Risk: Potential for data breaches or fraud.
- Impact: Financial loss, reputational damage, and regulatory penalties.

Are there any countermeasures that should be implemented to mitigate risks of using this service?
- Implement robust security measures like EMV 3D Secure and regular vulnerability assessments.
- Ensure compliance with PCI DSS and ISO/IEC 27001 standards.
- Conduct regular security audits and employee training.

display(Markdown(report))

Report for PaySmooth Solutions PayQuick Cloud Pro (2024-12-15)
URL: https://www.paysmooth.com/payquick

Answers
1 (General) What is the purpose of ‘PayQuick Cloud Pro’ by PaySmooth Solutions? Which problem is it promising to solve? Why would a customer consider using it?

Answer (100.0% confidence): PaySmooth Solutions PayQuick Cloud Pro provides comprehensive online payment services, offering a wide range of payment methods including credit cards, carrier payments, and various digital wallets like PayPay, AmazonPay, and ApplePay. It aims to solve the problem of managing multiple payment methods for businesses, ensuring secure and efficient transaction processing. Customers would consider using it to streamline their payment processes, enhance security with measures like EMV 3D Secure, and support business growth through financial services and consulting.

… snip …

6 (Compliance) According to the Trust Center page, or the official site, what laws and regulations is PaySmooth Solutions PayQuick Cloud Pro compliant with?

Answer (0.0% confidence): The specific list of laws and regulations that PaySmooth Solutions PayQuick Cloud Pro is compliant with is not clearly found on the official website or related pages. The site mentions compliance with ISO27001, PCI DSS, and the Privacy Mark, but does not provide a detailed list of specific laws and regulations such as GDPR, CCPA, APPI, etc.

7 (Compliance) According to the Trust Center page, or the official site, what compliance standards is PaySmooth Solutions PayQuick Cloud Pro following?

Answer (100.0% confidence): PaySmooth Solutions PayQuick Cloud Pro follows the following compliance standards:
1. ISO/IEC 27001: This is a global standard for information security management, and PaySmooth Solutions PayQuick Cloud Pro has obtained conformity certification for all of its business sites.
2. PCI DSS: PaySmooth Solutions PayQuick Cloud Pro's services are fully compliant with PCI DSS version 3.2.1, which is a global security standard for the credit card industry.
3. Privacy Mark: This certification indicates compliance with the Japanese Industrial Standard for personal information protection, JIS Q15001:2017.

… snip …

Reviewing the report

The Security Management Team (and any other teams involved in the review for the service) will then evaluate the reports to quickly gain a broad understanding of the service to guide their decision-making. To use their time as efficiently as possible, in most cases, they will read just the Executive Summary and only refer to the more detailed report if needed to confirm any specific concerns.

Following a simple manual and based on established and defined criteria, the team will then carry out their review. In some cases, such as where there isn’t much information available about the service online, the team may then decide to perform a deeper analysis (and perhaps bring out the spreadsheets), but in most cases, particularly for services and vendors with a high level of compliance maturity, the information from the application form and the LLM’s report should be enough to determine whether (or not) the service meets all our basic requirements and the information security risk is at an acceptable level, and if so, give their blessing by approving the service and adding it to our List of Approved External Services (with appropriate restrictions on how it may be used).

We can grasp whether sufficient information was available online to answer each question based on the ‘confidence score’ that the LLM assigns to each of its answers. If the confidence score is low, there was likely little information available. If the score is zero, there was nothing that the LLM thought it could use.

If there are many low-or-zero confidence scores in the report, we can disregard the report and resort to the old-fashioned method of sending a questionnaire to the vendor, but if there are just a few, we can reach out to the vendor and simply ask them these few specific questions; we may have an answer for this in just hours, or minutes during a call, rather than the weeks (or longer) it typically takes to complete a full questionnaire.

from reporting_code import report_confidence
confidence_report, improvements = report_confidence(answers, profile)

display(Markdown(confidence_report))

Confidence Report
- Percentage of answers collected from the vendor's web pages: 100.0%
- Average confidence score: 62.5%
- Number of answers with low confidence scores: 2

Answers with low confidence scores:
- (0% confidence) What is the list of companies or customers who are using PaySmooth Solutions PayQuick Cloud Pro?
- (0% confidence) According to the trust center page, or the official site, what laws and regulations is PaySmooth Solutions PayQuick Cloud Pro compliant with?

Some questions might fail, especially if the web site isn’t friendly with automation, because the information isn’t where we expect to find it, or because the context window wasn’t big enough to read all pages. For these questions, a manual check is likely to be necessary. We could also ask the vendor to improve their pages to cover these questions. See below for more about this.

How much does executing this script cost?

Performing a manual assessment can take several hours, and the results are likely going to be inconsistent. Let’s say that each assessment takes a total of six hours to complete (total people-hours spent by the applicant and all reviewers) and assume (for ease of calculation, not based on actual figures) that the average salary of those involved in the review is 10 million yen per year (equivalent to roughly 5000 yen per hour). Each review would then cost on average 30,000 yen, mostly spent searching the internet, reading web pages, and collating information into a report. If we were to do 250 reviews per year, this would represent an annual cost of around 7.5 million yen.

Using automation and LLMs can greatly reduce this time spent searching the internet looking for answers, as well as the time spent writing down every detail along the way and summarizing it in a report at the end.

from reporting_code import calculate_token_counts, token_count_markdown

token_report = calculate_token_counts(profile)
display(Markdown(token_count_markdown(token_report)))

Token Usage Report
- Total costs: 1.29$

Model: gpt-4o-2024-08-06
- Total calls: 32
- Total tokens: 248369 (1.29$)
- Input tokens: 243518 (1.22$)
- Output tokens: 4851 (0.07$)

In this example, we asked just 8 questions for a total of $1.29, but in a normal assessment of 32 basic questions plus follow-up, which can involve up to 100 total questions, the actual token cost is closer to $10.

If running this report reduces the people-hours required for a review by just 25%, this translates to a hypothetical saving of 7500 yen (~$50) in personnel costs, for a return-on-investment of 500%.

It’s not just the financial benefit—by streamlining the process and reducing the people-hours required to carry out the review, we reduce the length of the period the applicant has to wait for their external service to be approved. This helps the business to move faster. It is clear that using automation to conduct the initial assessment helps significantly.

Asking the vendor to provide additional details on their website

We are now done with our assessment. This was a one way process; our script searched the internet and collected answers to the questions we were interested in. Bonus—the vendor didn’t have to do anything—assuming all the information we needed was already published somewhere on their website.

But what if not all the information we needed was on their website? For information that is necessary for us to move forward, we will have to reach out to the vendor. One day, security teams across companies might talk to each other through APIs and secure handshakes. In the meantime, we could also let the vendor know what we couldn’t find by signaling them through their corporate web site.

The following step lists the questions for which our agent couldn’t find answers and performs a GET request on [vendor.domain]/compliance.txt for each one with the question as a parameter.

Unlike robots.txt or security.txt, compliance.txt isn’t used as a standard (to this date). The query is likely to fail. However, a vendor that monitors for errors on their corporate web site is likely to notice the hits on /compliance.txt and see the question. The user-agent configured to perform this request points back to this blog post. The compliance.txt file can actually be empty, especially if everything is already documented in the webpages. For example, the file could contain the URL to the vendor’s Privacy Policy and any statements of evidence regarding their compliance. If these pages are hard to process through automation (Javascript), populating this file in plain text with terms of services, privacy policies, and other details about the company’s compliance status directly in this file could actually simplify the overall review process. Protecting the agent against prompt injection attacks is however important.

from reporting_code import request_for_improvement

for answer in improvements:
    request_for_improvement(answer, profile)

Requesting `https://paysmooth.com/compliance.txt?question=What+is+the+list+of+companies+or+customers+who+are+using+PaySmooth+Solutions+Payment+Gateway%3F`
Requesting `https://paysmooth.com/compliance.txt?question=According+to+the+trust+center+page%2C+or+the+official+site%2C+what+laws+and+regulations+is+PaySmooth+Solutions+Payment+Gateway+compliant+with%3F`

Certified doesn’t mean secure

“How come we were hacked? We are ISO 27001 compliant!” – Some CEO somewhere…

Wait, all you’ve done is demonstrate that you could use an AI agent to read the internet. This is not proving that a vendor is secure!

Indeed, no matter what is written on a vendor’s website—what standards they claim to be compliant with, what certificates and audit reports they are willing to share, what security controls they claim to have implemented—it doesn’t ‘prove’ that the service or its vendor are secure or trustworthy. Performing an assessment isn’t about proving that a company or service is secure, that would require our security engineers to thoroughly assess the vendor’s technical environment, which given a lack of infinite time and resources would not be practical or realistic considering the number of applications we get per year. Even this wouldn’t be enough to say we’ve “proven” anything, and would purely be a point-in-time check at best (not to mention the fact that most vendors would never agree to the burden of being assessed in such a heavy way by us in the first place). At some point, we have to decide how much time and effort should be invested to review an external service for us to trust it enough to use it – to allow it to store or process the information that the applicant wants to use it for, to integrate with whatever other systems they want it to integrate with, or to be part of whatever (potentially critical or user-facing) operation it will be used for.

Which brings us back to what third party risk management actually is and the role of certification against standards in it. The expectations are that a vendor will not claim to be compliant to standards if they are not confident that they put in the work and actually achieved compliance. Even if we were to ask a vendor to fill out a security checklist for us, the trustworthiness of their answers wouldn’t be any different to what they have written or would write on their website.

The vendor’s compliance team already spent a significant amount of time sharing details about their internal practices on their website. The greatest service we can do for them is trusting that information. The second greatest service we can do for them is to only request information from them that we actually need, and that isn’t already available on their website.

Once all teams involved in the review have given the thumbs-up, the ticket is approved, the service added to our List of Approved External Services, and the applicant informed that they are good-to-go (and given relevant advice and warning on using and managing the service securely). This leaves the Security Management Team to move on to follow-up tasks, such as:

Registering the service in our Information Asset Register, along with the data it will store (and process)
Ensuring that any integrations between the service and other company systems is done securely
Ensuring that the new service is integrated to our internal access provider for Single Sign-On
Ensuring that logging and backups are configured appropriately for the system, in line with our policies
Working with our Threat Detection and Response Team to ensure that appropriate monitoring is in place for the new service, particularly if it is expected to handle a critical function or handle highly sensitive information

By simplifying the review process and keeping it toil-free, we also help free up time and maintain the momentum and energy of our Security Management Team to focus on these important next steps, that otherwise may be delayed or fall through the cracks.

Releasing the time spent on the review process allows us to invest it where it can be used more effectively: addressing and treating the risks associated with actually using the new service in our environment.

Conclusion

Using a variant of the script above, together with numerous other improvements to our review process and decision-making criteria, the Security Management Team was able to reduce the average total amount of people-hours necessary to review an external service by approximately 50%. Furthermore, our new process produced multiple other benefits:

The reviewer’s overall understanding of the service increased
Our assessments are now more thorough and consistent
Less mature companies can be easily identified (due to the lack of publicly available information)
The average time from application to approval (during which the applicant can’t use the service) has greatly reduced
Reviewer morale has improved since the process is less demanding and involves less manual, tedious work

Because we are using an LLM to read human-readable pages, there is no need to establish yet another documentation standard to report compliance (as opposed to a yaml or JSON with question IDs, tags, titles, description, etc). This script can request for additional details through a hit on compliance.txt, but isn’t waiting for an answer. By doing so, we simply hope that vendors can update their websites and/or Trust Centers to provide these additional details for the benefit of those looking for the same information that we were.

For us, using automation to conduct the part of our external service review doesn’t totally remove the burden of assessing our vendors, but does liberate time so our team could focus on other important tasks.

Where do we go from here?

Generative AI technologies are evolving quickly. Between the time we wrote this article and the time we published it, Google announced the release of Gemini 2.0 and Project Mariner. Anthropic also recently released Computer Use which would also allow an AI agent to take control of one’s computer. The automation we developed runs in a GCP Cloud Run instance, but nothing would stop someone from running it as a Chrome extension augmented by a LLM, where this LLM would take over the browser and execute a given list of research tasks. One thing is certain: there is huge potential for reducing toil in daily operation work.

— EOF —

print(f"Total execution time: {time.time() - start:0.2f} seconds")

Total execution time: 59.06 seconds

Installation instructions

If you wish to try this notebook, the source code is available here: https://github.com/cerebraljam/llms-at-work

This notebook was developed using Python 3.11 and Visual Studio Code.