Behind the Scenes on Large Language Model (LLM) Safety

A look at the many approaches to mitigate the risks of LLMs

Jun 12, 2024

I’m so sorry again for accidentally sending this early yesterday. I literally saw I forgot to uncheck the send an email button a split second after I hit publish. Thanks for understanding. Here’s the context!

In February, I wrote a piece reflecting on how the trust and safety space is evolving regarding artificial intelligence. Many focus on using large language models—or LLMs—across various applications. LLMs are AI systems that can generate original content with various applications and associated benefits.

However, when considering mitigating risks in this space, you must look beyond just what content is generated. You need to consider everything from the dataset used to train the model to how it is fine-tuned, the types of outputs, and more. This is opening up many new opportunities for those with trust and safety experience to figure out the new playbooks for AI.

Overall, companies need more transparency about how they think about this work. For instance, I like a piece Anthropic recently released a piece about how they do this around election risks.

Today, I am bringing you a piece by Will Carter, the former head of AI policy at Google and now working on Responsible Generative AI at Accenture. Will writes a behind-the-scenes look at working on LLM safety focusing on four main points (disclosure - I used Gemini to help generate the first draft of these):

LLMs are multipurpose tools that can be used in a wide range of applications, and it is important that they are fit for purpose and deployed responsibly.
There are many ways to customize LLMs and manage risks, including dataset filtering, fine-tuning, grounding, mitigating risks in products using classifiers, user empowerment, and ongoing monitoring.
Different products and applications require different approaches, and efforts to develop industry standards or guidelines must be flexible to allow companies to tailor mitigations to their specific products and use cases.
LLMs have tremendous potential to benefit users and society. They require a thoughtful and balanced approach that maximizes those benefits, supports innovation, reinforces high-quality standards, and manages risk.

In full disclosure, this piece was written and published with support from Google. I think it’s worthwhile for this community and a good explanation of how some are approaching this work. The support from Google is very helpful to me as I am still paying for the newsletter and podcast out of my pocket. I owe you all an update on how I think about various sponsorships for this newsletter (to be honest, I’m still thinking it through), but one thing I do promise is transparency.

You can read and download the full piece here. You will notice the need for flexibility and frameworks to help companies think of solutions versus overly prescriptive requirements. We were repeatedly told this was necessary when making the election best practices guides at the Integrity Institute. This is a shift I would like to see in the conversation about approaching not just AI but all safety work.

Behind the Scenes: A Guide to LLM Safety

110KB ∙ PDF file

Download

Anchor Change with Katie Harbath

Behind the Scenes on Large Language Model (LLM) Safety

A look at the many approaches to mitigate the risks of LLMs