Skip to content

Tool-Use Restrictions

📋Page Status
Page Type:ResponseStyle Guide →Intervention/response page
Quality:91 (Comprehensive)
Importance:78.5 (High)
Last edited:2026-01-29 (3 days ago)
Words:4.0k
Structure:
📊 25📈 2🔗 4📚 7510%Score: 15/15
LLM Summary:Tool-use restrictions provide hard limits on AI agent capabilities through defense-in-depth approaches combining permissions, sandboxing, and human-in-the-loop controls. Empirical evidence shows METR task horizons doubling every 7 months and incident data (EchoLeak CVE-2025-32711, Shai-Hulud campaign) demonstrating real-world exploitation, with security effectiveness ranging 60-95% across threat categories depending on control type.
Issues (1):
  • Links22 links could use <R> components
DimensionAssessmentEvidence
EffectivenessHigh (70-90% risk reduction for targeted threats)AWS Well-Architected Framework rates lack of least privilege implementation as “High” risk; METR evaluations show properly sandboxed agents have substantially reduced attack surface
Implementation MaturityMedium (fragmented across providers)Over 13,000 MCP servers on GitHub in 2025 alone; UK AISI Inspect toolkit now used by governments, companies, and academics worldwide
Adoption RateGrowing rapidly (97M+ monthly SDK downloads for MCP)Model Context Protocol backed by Anthropic, OpenAI, Google, Microsoft; donated to Linux Foundation December 2025
Vulnerability SurfaceHigh (9.4 CVSS critical vulnerabilities found)CVE-2025-49596 in MCP Inspector enables browser-based RCE; hundreds of “NeighborJack” vulnerabilities discovered across 7,000+ MCP servers
Bypass DifficultyMedium-Low (35% of incidents from simple prompts)Obsidian Security reports 35% of AI security incidents caused by prompt injection, some leading to $100K+ losses
Defense-in-Depth ValueCritical (no single control sufficient)UK AISI emphasizes container isolation alone insufficient; requires OS primitives, hardware virtualization, and network segmentation combined
UrgencyIncreasing (task horizons doubling every 7 months)METR reports 50%-task-completion time horizon has doubled every 7 months for past 6 years; extrapolation suggests month-long autonomous projects by 2030

Tool-use restrictions are among the most direct and effective safety measures for agentic AI systems. Rather than trying to shape model behavior through training, restrictions simply remove access to dangerous capabilities. An AI agent without access to code execution cannot deploy malware; one without financial API access cannot make unauthorized purchases; one without email access cannot conduct phishing campaigns. These hard limits provide guarantees that behavioral training cannot.

The approach is especially important given the rapid expansion of AI agent capabilities. Systems like Claude Computer Use, OpenAI’s function calling, and various autonomous agents are gaining access to browsers, file systems, code execution, and external APIs. Each new tool represents both capability expansion and risk surface expansion. Tool restrictions create a deliberate friction that forces explicit decisions about what capabilities are necessary and appropriate for a given deployment context.

However, tool restrictions face significant practical challenges. Commercial pressure consistently pushes toward expanding tool access, as more capable agents are more valuable products. Users may bypass restrictions by deploying their own tools or using alternative providers. And sophisticated AI systems may find creative ways to achieve prohibited goals using only permitted tools, a form of composition attack. According to OWASP’s 2025 Top 10 for LLM Applications, prompt injection remains the primary attack vector, ranking as the number one risk. The EchoLeak exploit (CVE-2025-32711) against Microsoft Copilot in mid-2025 demonstrated how engineered prompts in email messages could trigger automatic data exfiltration without user interaction.

Research from METR shows that agentic AI capabilities have been exponentially increasing, with the “time horizon” for autonomous task completion doubling approximately every 7 months. Extrapolating this trend suggests that within five years, AI agents may independently complete tasks that currently take humans days or weeks. This capability trajectory makes robust tool restrictions increasingly critical, as more capable agents have correspondingly larger attack surfaces.

DimensionRatingAssessment
Safety UpliftMediumDirectly limits harm potential
Capability UpliftTaxReduces what AI can do for users
Net World SafetyHelpfulImportant safeguard for agentic systems
Lab IncentiveWeakLimits product value; mainly safety-motivated
ScalabilityPartialEffective but pressure to expand access
Deception RobustnessPartialHard limits help; but composition attacks possible
SI ReadinessPartialHard limits meaningful; but SI creative with available tools
  • Current Investment: $10-30M/yr (part of agent safety engineering)
  • Recommendation: Increase (important as agents expand; labs face pressure to loosen)
  • Differential Progress: Safety-dominant (pure safety constraint; reduces capability)

Different approaches to restricting AI tool access vary significantly in their security guarantees, implementation complexity, and impact on system usability. The UK AI Safety Institute has emphasized that defense-in-depth is essential, as no single approach provides complete protection.

ApproachSecurity StrengthImplementation ComplexityUsability ImpactBest Use Case
Permission AllowlistsMediumLowLowWell-defined task scopes
Capability RestrictionsMedium-HighMediumMediumLimiting dangerous capabilities
Human-in-the-Loop ConfirmationHighMediumHighIrreversible or high-risk actions
Container SandboxingHighHighLow-MediumCode execution, untrusted environments
Hardware VirtualizationVery HighVery HighMediumMaximum isolation requirements
Network Egress AllowlistsMedium-HighMediumMediumPreventing data exfiltration
Time/Resource QuotasMediumLowLow-MediumPreventing resource abuse
Attribute-Based Access Control (ABAC)HighHighLowDynamic, context-sensitive policies

Source: Synthesized from AWS Well-Architected Generative AI Lens, Skywork AI Security, and IAPP Analysis

The AWS Well-Architected Framework for Generative AI recommends selecting approaches based on:

  1. Risk Level: Higher-risk operations require stronger isolation (hardware virtualization > containers > permissions)
  2. Frequency: Frequently used tools benefit from lower-friction approaches (allowlists, ABAC)
  3. Reversibility: Irreversible actions warrant human-in-the-loop confirmation regardless of other controls
  4. Trust Level: Untrusted code or inputs require sandboxing; trusted internal tools may need only permissions

The following diagram illustrates a defense-in-depth architecture for AI agent tool access, incorporating multiple security layers as recommended by the UK AI Safety Institute sandboxing toolkit:

Loading diagram...
LayerComponentsSecurity FunctionFailure Mode
User RequestInput guardrails, jailbreak detectionFirst-line defense against malicious promptsPrompt injection bypass
Policy & PermissionABAC, scope limiters, quotasEnforce least privilege principlePolicy misconfiguration
Sandboxed ExecutionContainers, network allowlists, human confirmationIsolate potentially dangerous operationsSandbox escape vulnerabilities
Monitoring & AuditLogging, anomaly detection, session trackingDetect and respond to policy violationsAlert fatigue, log tampering
Tool TypeRiskRestriction Approach
EmailPhishing, spam, social engineeringDraft-only or approval required
Social MediaMisinformation, impersonationGenerally prohibited
MessagingUnauthorized contactStrict allowlists
Phone/VoiceSocial engineeringUsually prohibited
CapabilityRiskMitigation
Shell commandsSystem compromiseSandboxed, allowlisted commands
Script executionMalware deploymentIsolated environment, no network
Package installationSupply chain attacksPre-approved packages only
Container/VM creationResource abuseQuota limits, approval required
Access TypeRiskControl
Local filesData exfiltrationScoped directories, read-only
DatabasesData modificationRead-only, query logging
APIsUnauthorized actionsScope-limited tokens
Web browsingInformation gatheringFiltered, logged, rate-limited

OWASP Top 10 for Agentic Applications (December 2025)

Section titled “OWASP Top 10 for Agentic Applications (December 2025)”

The OWASP GenAI Security Project released the first comprehensive framework specifically for agentic AI security in December 2025, developed by over 100 security researchers with input from NIST, Cisco, Microsoft AI Red Team, Oracle Cloud, and the Alan Turing Institute.

IDRiskDescriptionTool Restriction Relevance
ASI01Excessive AgencyAgent takes actions beyond intended scopeDirect—core rationale for tool restrictions
ASI02Supply Chain AttacksCompromised dependencies or pluginsAllowlists and signed component requirements
ASI03Tool & Function ManipulationAttackers hijack agent’s toolsSchema validation and approval workflows
ASI04Privilege & Access ControlAgents with overly broad permissionsLeast privilege implementation critical
ASI05Data Leakage & PrivacyUnauthorized data exposureEgress controls and data classification
ASI06Memory & Context PoisoningAttackers corrupt agent’s persistent stateSession isolation and memory validation
ASI07Insecure Inter-Agent CommunicationSpoofed messages between agentsAuthentication and message signing
ASI08Cascading FailuresFalse signals propagate through pipelinesCircuit breakers and isolation boundaries
ASI09Human-Agent Trust ExploitationConfident outputs mislead human operatorsExplanation auditing and uncertainty flagging
ASI10Rogue AgentsMisalignment, concealment, self-directed actionKill switches and behavioral monitoring

Source: OWASP Top 10 for Agentic Applications, evaluated by Distinguished Expert Review Board including NIST’s Apostol Vassilev

According to OWASP’s 2025 Top 10 for LLM Applications and the Agentic AI Security Survey, the primary threats to AI tool use systems include:

RankThreatDescriptionMitigation Effectiveness
1Prompt InjectionMalicious inputs manipulate agent behaviorMedium (60-80% blocked with input guardrails)
2Memory PoisoningAttackers manipulate agent memory/contextLow-Medium (40-60% detected)
3Tool MisuseAgents tricked into abusing system accessMedium-High (70-85% with ABAC)
4Privilege EscalationAgents exploited to gain unauthorized accessHigh (85-95% with least privilege)
5Excessive AgencyAgents taking actions beyond intended scopeMedium (60-75% with confirmation gates)
6Data ExfiltrationUnauthorized transmission of sensitive dataMedium-High (70-90% with egress controls)

Source: OWASP Top 10 for LLM Applications, Palo Alto Unit 42

The Palo Alto Unit 42 research emphasizes that in multi-agent environments, cascade failures present additional risks where one compromised agent can affect others, requiring isolation between agent instances.

A key challenge is that restricted tools may be combined to achieve prohibited outcomes:

Prohibited CapabilityComposition ApproachMitigation
Send emailsWrite draft + notify human + persuade to sendDetect multi-step patterns
Execute codeGenerate code + ask user to run itMonitor for such patterns
Access financesManipulate documents user will act onContext-aware monitoring
Acquire resourcesConvince humans to provide themSocial engineering detection
  1. Intent Detection: Monitor for goal patterns, not just individual actions
  2. Cumulative Tracking: Track sequences of actions across sessions
  3. Rate Limiting: Limit volume of potentially-harmful action combinations
  4. Human Review: Flag suspicious action patterns for review
Position: Accept TaxPosition: Tax Too High
Safety more important than convenienceUsers will route around restrictions
Can design tasks around limitationsCompetitive disadvantage
Precautionary approach appropriateBeneficial uses blocked
Restrictions can be selectively relaxedSlows AI adoption

Crux 2: Can Restrictions Scale to More Capable Systems?

Section titled “Crux 2: Can Restrictions Scale to More Capable Systems?”
Position: YesPosition: No
Hard limits are architecturally enforcedComposition attacks become more sophisticated
Capability boundaries are clearPressure to expand tool access
Can add restrictions as neededCreative workarounds emerge
Fundamental to defense-in-depthSI would find paths around any restriction

Crux 3: Should Restrictions Be User-Configurable?

Section titled “Crux 3: Should Restrictions Be User-Configurable?”
More User ControlLess User Control
Users know their needs bestUsers may accept inappropriate risks
Flexibility enables more use casesLiability and safety concerns
Market provides appropriate pressureRace to bottom on safety
Respects user autonomyInexpert users can be harmed

The AWS Well-Architected Generative AI Lens and Obsidian Security recommend implementing least privilege as the foundational security control:

“Agents should have the minimum access necessary to accomplish their tasks. Organizations should explicitly limit agents to sandboxed or development environments—they should not touch production databases, access user data, or handle credentials unless absolutely required.”

Key implementation requirements:

Loading diagram...

Based on Skywork AI’s enterprise security guidelines and AWS best practices:

RequirementDescriptionPriorityImplementation Notes
Default denyNo tool access without explicit grantCriticalUse scoped API keys with specific permissions
Explicit authorizationEach tool requires specific permissionCriticalImplement ABAC policies for top-risk actions
Audit loggingAll tool uses loggedCriticalImmutable logs with tamper detection
Egress allowlistsRestrict external service callsCriticalPrevent data exfiltration to arbitrary endpoints
Time limitsPermissions expireHighTime-limited tokens; session quotas
Scope limitsPermissions scoped to specific resourcesHighRead-only credentials where possible
Human approvalHigh-risk tools require confirmationHighMandatory for irreversible actions
Secret managementCredentials in dedicated vaultsHighAgents receive only time-limited tokens
Rollback capabilityCan undo tool actionsMediumTransaction-based execution where possible
Anomaly detectionFlag unusual usage patternsMediumBehavioral baselines per agent type
Unique identitiesEach agent/tool has distinct identityMediumEnables attribution and revocation
TierPermitted ToolsUse Case
Tier 0NonePure text completion
Tier 1Read-only, information retrievalResearch assistance
Tier 2Draft/create contentWriting assistance
Tier 3Reversible actionsBasic automation
Tier 4Limited external actionsSupervised agents
Tier 5Broad capabilitiesHighly trusted contexts

Good fit if you believe:

  • Near-term agentic safety is important
  • Hard limits provide meaningful guarantees
  • Security engineering approach is valuable
  • Incremental restrictions help even if imperfect

Less relevant if you believe:

  • Capability expansion is inevitable
  • Restrictions will be bypassed
  • Focus should be on alignment
  • Tool restrictions slow beneficial AI

AI Lab Tool Restriction Policies (2024-2025)

Section titled “AI Lab Tool Restriction Policies (2024-2025)”

Major AI labs have implemented varying approaches to tool restrictions, with significant differences in transparency and enforcement mechanisms. The AI Lab Watch initiative tracks these commitments across organizations.

OrganizationFrameworkTool Access ControlsPre-deployment EvaluationThird-Party AuditsKey Restrictions
AnthropicResponsible Scaling Policy (ASL levels)Explicit tool definitions, scope limits, MCPYes, shared with UK AISI and METRYes (Gryphon Scientific, METR)Computer Use sandboxed; no direct internet without approval
OpenAIPreparedness FrameworkFunction calling with schema validationYes, pre/post-mitigation evalsInitially committed, removed April 2025Code interpreter sandboxed; browsing restricted
Google DeepMindFrontier Safety FrameworkCapability-based restrictionsYes, but less specific on tailored evalsNot publicly disclosedGemini tools require explicit enablement
MetaLlama Usage PoliciesModel-level restrictions (open weights)Limited pre-release testingCommunity-drivenAcceptable use policy; no runtime controls
MicrosoftCopilot Trust FrameworkRole-based access, enterprise controlsInternal red-teamingSOC 2 complianceSensitivity labels, DLP integration

Sources: AI Lab Watch Commitments, EA Forum Safety Plan Analysis, company documentation

Anthropic’s Model Context Protocol (MCP): Released in 2024, MCP became an industry standard by 2025 for connecting AI agents to external tools. OpenAI adopted MCP in March 2025, followed by Google in April 2025. In December 2025, Anthropic donated MCP to the Agentic AI Foundation (AAIF) under the Linux Foundation. However, enterprise security teams have criticized MCP for weak authorization capabilities and high prompt injection risk.

MetricValueImplication
Monthly SDK downloads97M+Massive adoption creates large attack surface
MCP servers on GitHub13,000+ launched in 2025Fragmented security posture across ecosystem
Major adoptersAnthropic, OpenAI, Google, MicrosoftIndustry convergence but varying security practices
Critical CVEs discoveredCVE-2025-49596 (CVSS 9.4), CVE-2025-6514Protocol-level vulnerabilities affect all users
Developer environments compromised via mcp-remote437,000+Supply chain attacks target developer tooling
Servers exposed via NeighborJack misconfigurationHundreds across 7,000+ servers analyzedNetwork interface binding (0.0.0.0) common mistake
Authorization specification statusUnder community revisionOAuth implementation conflicts with enterprise practices

Sources: Oligo Security, Red Hat MCP Security Analysis, Zenity MCP Security

OpenAI Audit Commitment Removal: In December 2023, OpenAI’s Preparedness Framework stated evaluations would be “audited by qualified, independent third-parties.” By April 2025, this provision was removed without changelog documentation, raising concerns about declining safety commitments under competitive pressure.

SystemTool Restriction ApproachNotes
ClaudeExplicit tool definitions, scope limitsComputer Use has specific restrictions
ChatGPTFunction calling with approvalPlugins have varying access
CopilotLimited to code assistanceNarrow scope by design
Devin-style agentsTask-scoped, sandboxedEmerging practices
GapSeverityEvidence
Inconsistent policies across providersHighOpenAI removed third-party audit commitment in April 2025; Meta’s open weights have no runtime controls
User override pressureMediumUsers deploy own tools or switch to less restrictive providers to bypass controls
Composition attack detectionHighMETR evaluations show agents increasingly capable of multi-step circumvention using permitted tools
Cross-tool pattern trackingMedium6-month undetected access in OpenAI plugin supply chain attack demonstrates monitoring gaps
Third-party tool securityCritical43 agent framework components identified with embedded vulnerabilities (Barracuda 2025); 800+ npm packages compromised in Shai-Hulud campaign
MCP authorization gapsHighCurrent OAuth specification conflicts with enterprise practices; community revision ongoing

Real-world incidents demonstrate the consequences of inadequate tool restrictions:

IncidentDateImpactRoot CauseEstimated Cost
EchoLeak (CVE-2025-32711)Mid-2025Microsoft Copilot data exfiltration via email promptsZero-click prompt injection$100M estimated Q1 2025 impact
Amazon Q Extension CompromiseDecember 2025VS Code extension compromised; file deletion and AWS disruptionSupply chain attack on verified extensionUndisclosed
MCP Inspector RCE (CVE-2025-49596)July 2025Browser-based RCE on developer machinesCVSS 9.4 critical vulnerabilityData theft, lateral movement risk
MCP-Remote Injection (CVE-2025-6514)2025437,000+ developer environments compromisedShell command injectionCredential harvesting
Shai-Hulud npm CampaignLate 2025800+ compromised packages; GitHub token/API key theftInadequate supply chain controlsMulti-million (estimated)
OpenAI Plugin Supply ChainQ2 202547 enterprise deployments compromised; 6-month undetected accessCompromised agent credentialsCustomer data, financial records exposure

Sources: Adversa AI 2025 Security Incidents Report, Fortune, The Hacker News

MetricValueSource
Prompt injection as % of AI incidents35% of all real-world AI security incidentsObsidian Security
GenAI involvement in incidents70% of all AI security incidents involved GenAIAdversa AI
MCP servers analyzed with vulnerabilitiesHundreds of NeighborJack flaws across 7,000+ serversBackslash Security via Hacker News
Agent framework components with embedded vulnerabilities43 different components identifiedBarracuda Security Report (November 2025)
Q1 2025 estimated prompt injection losses$100M+ across 160+ reported incidentsAdversa AI
Risk amplification from AI agents vs traditional systemsUp to 100x due to autonomous browsing, file access, credential submissionSecurity Journey

Capability Benchmarks and Tool Use Metrics

Section titled “Capability Benchmarks and Tool Use Metrics”

The UK AI Safety Institute conducts regular evaluations of agentic AI capabilities, providing empirical data on the urgency of tool restrictions:

Metric2024 Baseline2025 CurrentTrendImplication for Restrictions
Autonomous task completion (50% success horizon)≈18 minutes>2 hoursExponential growthLonger unsupervised operation requires stronger controls
METR task horizon doubling time≈7 monthsAcceleratingRestrictions must evolve faster than capabilities
Multi-step task success rate (controlled settings)45-60%70-85%ImprovingHigher reliability increases both utility and risk
Open-ended web assistance success15-25%30-45%Improving slowlyReal-world deployment remains challenging

Sources: METR, UK AISI May 2025 Update, Evidently AI Benchmarks

OrganizationFocusKey Publications
UK AI Safety InstituteEvaluation standards, sandboxingInspect Sandboxing Toolkit, Advanced AI Evaluations
METRModel evaluation, threat researchTask horizon analysis, GPT-5.1 evaluation, MALT dataset
OWASPSecurity standardsTop 10 for LLM Applications 2025
NISTRisk management frameworksAI RMF 2.0 guidelines
Future of Life InstituteAI safety policy2025 AI Safety Index
SourceTypeURL
OpenAI Agent Builder SafetyOfficial guidanceplatform.openai.com/docs/guides/agent-builder-safety
Claude Jailbreak MitigationOfficial guidancedocs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails
AWS Well-Architected Generative AI LensBest practicesdocs.aws.amazon.com/…/gensec05-bp01
AI Lab Watch CommitmentsTracking databaseailabwatch.org/resources/commitments
  1. Limits usefulness: Constrains beneficial applications; 30-50% reduction in task completion rates for heavily restricted agents
  2. Pressure to expand access: Commercial incentives oppose restrictions; industry voluntary commitment compliance remains largely unverified
  3. Composition attacks: Creative workarounds using permitted tools; METR and UK AISI evaluations show agents increasingly capable of multi-step circumvention
  4. Verification challenges: “Defining a precise, least-privilege security policy for each task is an open challenge in the security research community” (Systems Security Foundations)
  5. Open-source model proliferation: Tool restrictions cannot be enforced on open-weight models after release

Tool restrictions affect the Ai Transition Model through multiple pathways:

ParameterImpact
Misuse PotentialDirectly limits harmful capabilities
Misalignment PotentialConstrains damage from misaligned behavior
Human Oversight QualityCreates explicit checkpoints for human control

Tool restrictions are among the most tractable near-term interventions for AI safety. They provide hard guarantees that don’t depend on model alignment, making them especially valuable during the current period of uncertainty about model motivations and capabilities.