In Conversation with Anthropic Co-Founder Tom Brown

web

salesforceventures.com·salesforceventures.com/perspectives/in-conversation-with-...

A fireside chat with Anthropic co-founder Tom Brown covering Constitutional AI, RLHF, LLM stacking, hallucination reduction, and AI safety philosophy, offering practitioner insights into how Anthropic approaches building safe and helpful AI systems.

Metadata

Importance: 42/100interviewcommentary

Summary

Tom Brown, co-founder of Anthropic, discusses the company's approach to AI safety including Constitutional AI as an alternative to pure RLHF, techniques for stacking LLMs to improve output quality and safety, and strategies for reducing hallucinations. He also covers domain-specific model building, promising AI use cases like code generation and RAG, and Anthropic's ongoing work to make AI helpful, harmless, and honest.

Key Points

•Constitutional AI allows a model to evaluate another model's outputs against a written constitution of values, scaling RLHF without requiring constant human feedback.
•LLM stacking (e.g., Claude Instant + Claude 2.1) enables tiered moderation: a fast small model handles routine checks, escalating edge cases to a larger model.
•Fine-tuning large models for domain-specific tasks generally outperforms building narrow small models from scratch.
•Claude 2.1 reduced hallucination error rates by 2x compared to Claude 2, with a dedicated internal team tracking and minimizing hallucination metrics.
•Key promising AI use cases include code generation, retrieval-augmented generation (RAG) for QA, and customer service automation.

Cached Content Preview

HTTP 200Fetched Apr 10, 20267 KB

Salesforce Ventures recently hosted a dinner and networking event for a group of portfolio companies and Fortune 500 executives in San Francisco. The evening was highlighted by a fireside chat between Tom Brown, co-founder of Salesforce Ventures’ portfolio company Anthropic , and Salesforce President and Chief Product Officer David Schmaier. 

 The duo discussed the origins of Anthropic, the imperative of AI safety, techniques for generating better LLM outputs, generative AI use cases, improving AI accuracy, the prospect of artificial general intelligence (AGI), and how to ensure AI will be used as a force for good in the world. 

 Their conversation featured a ton of great insights for founders, builders, and AI enthusiasts alike. Here were a few of our top takeaways*…

 *Quotes have been edited for clarity and concision 

 On Anthropic’s approach to creating &#8216;harmless&#8217; AI…

 “Many models are trained using reinforcement learning from human feedback (RLHF). The idea behind RLHF is you reward and punish the model for doing well or not doing well on the paths you care about. We had people upvoting or downvoting how well the model does a task. That’s how you can make the model become a harmless assistant.” 

 “I think we noticed that as the models were getting smarter, they started to do most tasks well. We developed constitutional AI to turn a model into the entity that upvotes or downvotes another model. A person can write up a constitution of what it means to be helpful, harmless, and honest, and then a model will read the interactions between the human and the assistant and consider if the assistant is acting in accordance with the constitution. This is a way to take a simple document and turn it into a model personality.”

 On the impact of ‘stacking’ LLMs to generate higher-quality outputs…

 “We have Claude 2.1, which is a large model, and then we have Claude Instant, which is a smaller model. Depending on the task, sometimes you&#8217;ll want a smaller model because it&#8217;s faster and cheaper. For example, Midjourney is one of our customers. Whenever you put any prompt into Midjourney to generate text, it&#8217;ll pass it through Claude Instant and Claude Instant checks if it&#8217;s violating Midjourney’s terms of service. And if it thinks it might be, it&#8217;ll give a little message to the user saying ‘this might be against our terms. Do you want to appeal it?’ And if you hit yes, it goes to Claude&#8217;s Instant&#8217;s boss, which is Claude 2.1, who thinks about it a little bit longer, and maybe says, ‘Sorry, Claude Instant was totally wrong. You&#8217;re fine actually.’”

 On building domain-specific models…

 “There&#8217;s two different ways I think about building a domain-specific model. One is that you take a large model and fine tune it to make it better at a specific task. The other is you build a narrow model that’s good at one specific thing. Claude Instant is faster and cheaper than Claude 2.1, but less perfor

... (truncated, 7 KB total)

Resource ID: 9a8fc36b307a9fa2