Anthropic says its latest AI bot can beat Gemini and ChatGPT

/

The Claude 3 AI models are more capable than their predecessors and open to answering ‘harmless’ questions Claude 2.1 would’ve refused.

p>span:first-child]:text-gray-13 [&_.duet–article-byline-and]:text-gray-13″>

a]:text-gray-13″>If you buy something from a Verge link, Vox Media may earn a commission. See our ethics statement.

a:hover]:text-gray-63 [&>a:hover]:shadow-underline-black dark:[&>a:hover]:text-gray-bd dark:[&>a:hover]:shadow-underline-gray [&>a]:shadow-underline-gray-63 dark:[&>a]:text-gray-bd dark:[&>a]:shadow-underline-gray”>Image: The Verge

Anthropic, the AI company started by several former OpenAI employees, says the new Claude 3 family of AI models performs as well as or better than leading models from Google and OpenAI. Unlike earlier versions, Claude 3 is also multimodal, able to understand text and photo inputs.

Anthropic says Claude 3 will answer more questions, understand longer instructions, and be more accurate. Claude 3 can understand more context, meaning it can process more information. There’s Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus, with Opus being the largest and “most intelligent model.” Anthropic says Opus and Sonnet are now available on claude.ai and its API. Haiku will be released soon. All three models can be deployed on chatbots, auto-completion, and data extraction tasks.

Previous versions of Claude refused to answer some prompts that were harmless, which the company writes “suggests a lack of contextual understanding.” The new models are less likely to refuse to answer prompts that toe the line of its safety guardrails, similar to rumors about Meta’s plans for Llama 3 when it’s released. 

Bar chart showing a significantly lower rate of “refused on harmless prompt” responses by Claude 3 AI models (near or below 10 percent), compared to Claude 2.1 (around 25 percent).

Bar chart showing a significantly lower rate of “refused on harmless prompt” responses by Claude 3 AI models (near or below 10 percent), compared to Claude 2.1 (around 25 percent).

a:hover]:text-black [&>a:hover]:shadow-underline-black dark:[&>a:hover]:text-gray-e9 dark:[&>a:hover]:shadow-underline-gray-63 [&>a]:shadow-underline-gray-13 dark:[&>a]:shadow-underline-gray-63″>Incorrect refusals on Claude 3 versus Claude 2.1.
a:hover]:text-gray-63 [&>a:hover]:shadow-underline-black dark:[&>a:hover]:text-gray-bd dark:[&>a:hover]:shadow-underline-gray [&>a]:shadow-underline-gray-63 dark:[&>a]:text-gray-bd dark:[&>a]:shadow-underline-gray”>Image: Anthropic

Anthropic claims Claude 3 models can give near-instant results even while parsing dense material like a research paper. A blog post says Haiku, the smallest version of Claude 3, is “the fastest and most cost-effective model on the market,” able to read a dense research paper complete with charts and graphs “in less than three seconds.”

Anthropic says Opus outperformed most models in several benchmarking tests. It showed better graduate-level reasoning than OpenAI’s GPT-4, getting 50.4 percent in that test over GPT-4’s 35.7 percent. It also answered math questions, coded, and understood reasoning better. 

A list of benchmark scores comparing AI models from Anthropic, OpenAI, and Google, showing Claude 3 (Opus) as the highest scoring model on all of the tests listed.

A list of benchmark scores comparing AI models from Anthropic, OpenAI, and Google, showing Claude 3 (Opus) as the highest scoring model on all of the tests listed.

a:hover]:text-black [&>a:hover]:shadow-underline-black dark:[&>a:hover]:text-gray-e9 dark:[&>a:hover]:shadow-underline-gray-63 [&>a]:shadow-underline-gray-13 dark:[&>a]:shadow-underline-gray-63″>Claude 3 models compared to GPT-4, GPT-3.5, and Gemini 1.0 Ultra / Pro.
a:hover]:text-gray-63 [&>a:hover]:shadow-underline-black dark:[&>a:hover]:text-gray-bd dark:[&>a:hover]:shadow-underline-gray [&>a]:shadow-underline-gray-63 dark:[&>a]:text-gray-bd dark:[&>a]:shadow-underline-gray”>Image: Anthropic

The new models also significantly improve against the previous Claude 2.1 model. Sonnet, the middle ground model, was twice as fast as Claude 2 and Claude 2.1. “It excels in tasks demanding rapid responses, like knowledge retrieval or sales automation,” Anthropic said. 

Anthropic trained the Claude 3 models on a mix of nonpublic internal and third-party datasets and publicly available data as of August 2023. The company says in a paper introducing the three models that these were trained using hardware from Amazon’s AWS and Google Cloud. Both companies invested in Anthropic, with Amazon putting $4 billion into the company. Claude 3 will be available on AWS’s model library Bedrock and in Google’s Vertex AI.

This post was originally published on The Verge

Share your love