Llama Guard 3 and Meta's AI Responsibility Approach for Llama 3.1

web

Meta AI·ai.meta.com/blog/meta-llama-3-1-ai-responsibility/

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Meta AI

Official Meta blog post documenting safety measures for the Llama 3.1 release; relevant for practitioners interested in content moderation classifiers and industry approaches to responsible open-source model deployment.

Metadata

Importance: 52/100blog postprimary source

Summary

Meta's blog post introduces Llama Guard 3, a safety classifier model designed to detect unsafe content in LLM inputs and outputs, released alongside Llama 3.1. It outlines Meta's responsible deployment approach including red-teaming, safety evaluations, and open-source safety tooling for the broader AI ecosystem.

Key Points

•Llama Guard 3 is a multilingual safety classifier built on Llama 3.1 to filter harmful inputs/outputs across multiple languages
•Meta conducted extensive red-teaming and adversarial testing before releasing Llama 3.1 to identify and mitigate safety risks
•The post describes Meta's layered safety approach including Prompt Guard and Code Shield tools alongside Llama Guard 3
•Meta frames open-source release as beneficial for safety by enabling community scrutiny and broader access to safety tooling
•Safety evaluations cover categories like violent speech, hate speech, privacy violations, and specialized cybersecurity risks

Cited by 4 pages

Page	Type	Quality
Frontier AI Labs (Overview)	--	85.0
Meta AI (FAIR)	Organization	51.0
Corporate AI Safety Responses	Approach	68.0
Open Source AI Safety	Approach	62.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202613 KB

Expanding our open source large language models responsibly 
 
 
 
 
 
 
 
 
 
 
 
 

 Products 
 AI Research 
 The Latest 
 About 
 Get Llama 
 Try Meta AI 
 
 Open Source Expanding our open source large language models responsibly 

 July 23, 2024 • 7 minute read 
 Takeaways:

 Meta is committed to openly accessible AI. Read Mark Zuckerberg’s letter detailing why open source is good for developers, good for Meta, and good for the world.
 Open source has multiple benefits: It helps ensure that more people around the world can access the opportunities that AI provides, guards against concentrating power in the hands of a small few, and deploys technology more equitably. And we believe it will lead to more safe AI outcomes across society. That’s why we continue to advocate for making open access to AI the industry standard.
 We’re bringing open intelligence to all by introducing the Llama 3.1 collection of models, which expand context length to 128K, add support across eight languages, and include Llama 3.1 405B—the first frontier-level open source AI model.
 As we improve the capabilities of our models, we’re also scaling our evaluations, red teaming, and mitigations, including for catastrophic risks.
 We’re bolstering our system-level safety approach with new security and safety tools, which include Llama Guard 3 (an input and output multilingual moderation tool), Prompt Guard (a tool to protect against prompt injections), and CyberSecEval 3 (evaluations that help AI model and product developers understand and reduce generative AI cybersecurity risk). We’re also continuing to work with a global set of partners to create industry-wide standards that benefit the open source community.
 We prioritize responsible AI development and want to empower others to do the same. As part of our responsible release efforts, we’re giving developers new tools and resources to implement the best practices we outline in our Responsible Use Guide .
 

 How Meta is scaling AI safety

 
 We’re closely following as governments around the world seek to define AI safety. Meta supports new safety institutes and works with established entities—including the National Institute of Standards and Technology (NIST) and ML Commons—to drive toward common definitions, threat models, and evaluations. Working with bodies such as Frontier Model Forum (FMF) and Partnership on AI (PAI), we seek to develop common definitions and best practices, while also engaging with civil society and academics to help inform our approach. For this release, we’ve continued to build on our efforts to evaluate and red team our models in areas of public safety and critical infrastructure, which includes cybersecurity, catastrophic risks, and child safety.

 It’s important to note that before releasing a model, we work to identify, evaluate, and mitigate potential risks through several measures:

 We conduct pre-deployment risk assessments, safety evaluations and fine-tuning, and extensive red teaming wi

... (truncated, 13 KB total)

Resource ID: a4f0e262dd30ec02 | Stable ID: sid_hkvCoLnizJ