Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku

blog

Anthropic·anthropic.com/news/3-5-models-and-computer-use

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Anthropic

Marks a significant capability milestone for frontier AI: agentic computer control raises new safety and oversight challenges as AI systems can now autonomously interact with real software environments, relevant to discussions of AI action-taking and human oversight.

Metadata

Importance: 62/100blog postnews

Summary

Anthropic announces a major capability expansion: Claude 3.5 Sonnet gains 'computer use' ability (controlling mouse, keyboard, and screen), an upgraded Claude 3.5 Sonnet with improved reasoning and coding, and the fast/affordable Claude 3.5 Haiku. Computer use represents a significant step toward agentic AI that can autonomously operate computers to complete tasks.

Key Points

•Computer use (beta) allows Claude to interact with computers like a human—moving cursors, clicking, typing—enabling autonomous task completion across software.
•Upgraded Claude 3.5 Sonnet shows significant improvements on coding benchmarks (SWE-bench 49%) and reasoning tasks over the prior version.
•Claude 3.5 Haiku delivers Claude 3 Opus-level performance at faster speeds and lower cost, democratizing access to capable models.
•Computer use introduces new safety considerations as Claude can now take real-world actions with persistent consequences on actual systems.
•Early access partners (Replit, Asana, Canva, etc.) are already building agentic workflows using computer use capabilities.

Cited by 2 pages

Page	Type	Quality
Long-Horizon Autonomous Tasks	Capability	65.0
Anthropic	Organization	74.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20268 KB

Announcements Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku

 Oct 22, 2024 Update (12/03/2024): We have revised the pricing for Claude 3.5 Haiku. The model is now priced at $0.80 MTok input / $4 MTok output. 

 Today, we’re announcing an upgraded Claude 3.5 Sonnet , and a new model, Claude 3.5 Haiku . The upgraded Claude 3.5 Sonnet delivers across-the-board improvements over its predecessor, with particularly significant gains in coding—an area where it already led the field. Claude 3.5 Haiku matches the performance of Claude 3 Opus, our prior largest model, on many evaluations at a similar speed to the previous generation of Haiku.

 We’re also introducing a groundbreaking new capability in public beta: computer use . Available today on the API , developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta. At this stage, it is still experimental —at times cumbersome and error-prone. We&#x27;re releasing computer use early for feedback from developers, and expect the capability to improve rapidly over time.

 Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company have already begun to explore these possibilities, carrying out tasks that require dozens, and sometimes even hundreds, of steps to complete. For example, Replit is using Claude 3.5 Sonnet&#x27;s capabilities with computer use and UI navigation to develop a key feature that evaluates apps as they’re being built for their Replit Agent product.

 The upgraded Claude 3.5 Sonnet is now available for all users. Starting today, developers can build with the computer use beta on the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. The new Claude 3.5 Haiku will be released later this month.

 Claude 3.5 Sonnet: Industry-leading software engineering skills

 

 The updated Claude 3.5 Sonnet shows wide-ranging improvements on industry benchmarks, with particularly strong gains in agentic coding and tool use tasks. On coding, it improves performance on SWE-bench Verified from 33.4% to 49.0%, scoring higher than all publicly available models—including reasoning models like OpenAI o1-preview and specialized systems designed for agentic coding. It also improves performance on TAU-bench , an agentic tool use task, from 62.6% to 69.2% in the retail domain, and from 36.0% to 46.0% in the more challenging airline domain. The new Claude 3.5 Sonnet offers these advancements at the same price and speed as its predecessor.

 Early customer feedback suggests the upgraded Claude 3.5 Sonnet represents a significant leap for AI-powered coding. GitLab, which tested the model for DevSecOps tasks, found it delivered stronger reasoning (up to 10% across use cases) with no added latency, making it an ideal choice to power multi-step software development processes. Cognition uses the new Claude 3.5 Sonnet f

... (truncated, 8 KB total)

Resource ID: 9e4ef9c155b6d9f3 | Stable ID: sid_BQNS1NuaOy