Anthropic's new AI model can control your computer

[ad_1]

In a presentation to investors last spring, Anthropic said it intends to build artificial intelligence to power virtual assistants that can conduct research, respond to emails and handle other back-office functions themselves. The company referred to this as a “next-generation self-learning AI algorithm” — one that it believes could, if all goes according to plan, one day automate large parts of the economy.

It’s taken some time, but AI is starting to arrive.

Anthropy on Tuesday Released An upgraded version of the Claude 3.5 Sonnet model that can understand and interact with any desktop application. Through the new “Use Computer” API, which is now in open beta, the model can mimic keystrokes, button clicks and mouse gestures, essentially simulating a person sitting at a computer.

“We trained Claude to see what is happening on the screen and then use available software tools to perform tasks,” Anthropic wrote in a blog post shared with TechCrunch. “When a developer assigns Claude to use a computer program and grants him the necessary access, Claude looks at screenshots of what is visible to the user, then calculates how many pixels vertically or horizontally he needs to move the cursor in order to click in the right place.”

Developers can experiment with PC via Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI platform. New Sonata 3.5 without The use of PC in Claude applications is being introduced, and brings various performance improvements compared to the released Sonnet 3.5 model.

Table of Contents

Application automation

A tool that can automate tasks on a computer is not a new idea. Countless companies offer such tools, from decades-old RPA vendors to more recent startups like Relay, Induced AI, and Automat.

In the race to develop so-called “artificial intelligence agents,” this field is becoming more crowded. AI agents is still an undefined term, but it generally refers to artificial intelligence that can automate programs.

some Analysts AI agents say they can provide companies with an easier path to monetizing the billions of dollars they are pouring into AI. Companies seem to agree: according to a recent Capgemini report reconnaissance10% of organizations already use AI agents and 82% will integrate them within the next three years.

Salesforce made splashy announcements about its AI agent technology this summer, while Microsoft Described New tools for building AI agents yesterday. OpenAI, which is Planning its own brand of AI agentsHe sees the technology as a step towards super-intelligent artificial intelligence.

Anthropic calls the AI agent concept an “action execution layer” that allows the new 3.5 Sonnet to execute desktop-level commands. With its ability to browse the web (not a first for AI models, but a first for Anthropic), the 3.5 Sonnet can use any website and any application.

Anthropic’s new AI can control applications on your computer. Image credits:Anthropic

“Humans remain in control by making specific prompts that guide Claude’s actions, such as using data from my computer and across the Internet to fill out this form,” an Anthropic spokesperson told TechCrunch. “People enable and restrict access as needed. Claude breaks down user prompts into computer commands (eg: move cursor, click, type) to accomplish that specific task.

Software development platform Replit used an early version of the new 3.5 Sonnet model to create an “independent verifier” that can evaluate applications as they are built. Meanwhile, Canva says it’s exploring ways the new model might be able to support the design and editing process.

But how is this different from other AI agents out there? It’s a reasonable question. Rabbit, a consumer gadgets startup, is creating a web agent that can do things like buy movie tickets online; Adept, recently acquired by Amazon, trains models to navigate websites and navigate programs; Twin Labs uses off-the-shelf models, including GPT-4o from OpenAI, to automate desktop processes.

Anthropic claims that the new 3.5 Sonnet is simply a stronger, more powerful model that can do encoding tasks better than even OpenAI’s flagship o1, according to our verified SWE-bench. Although not explicitly trained to do so, the upgraded 3.5 Sonnet self-corrects and retries tasks when it encounters obstacles, and can work toward goals that require dozens or hundreds of steps.

Claude 3.5 New Sonnet — The performance of the new Claude 3.5 Sonnet model according to different criteria. **Image credits:**Anthropic

But don’t fire your secretary just yet.

In an evaluation designed to test an AI agent’s ability to assist with airline booking tasks, such as modifying a flight reservation, the new 3.5 Sonnet was able to successfully complete less than half of the tasks. In a separate test involving tasks such as initiating return, the 3.5 Sonnet failed in about a third of the cases.

Anthropic admits that the upgraded 3.5 Sonnet struggles with basic actions like scrolling and zooming, and that it can miss “short-lived” actions and notifications because of the way it captures screenshots and stitches them together.

“Claude’s computer usage remains slow and often error-prone,” Anthropic wrote in her post. “We encourage developers to start exploration with low-risk missions.”

Risky business

But is the new Sonata 3.5 capable enough to be dangerous? maybe.

Modern He studies I found that models without Being able to use desktop applications, such as OpenAI’s GPT-4o, they were willing to engage in malicious “multi-step proxy behaviour”, such as requesting a fake passport from someone on the dark web, when “attacked” using jailbreaking techniques. Jailbreaks resulted in high success rates in performing malicious tasks even for models protected by filters and safeguards, according to the researchers.

One can imagine what a model it is with Desktop access can occur more Ruin – for example, through to exploit Security vulnerabilities in the application to compromise personal information (or Store chats in plain text). Aside from the software tools available to him, the online form and app connections can open the way for him Malicious jailbreak.

Anthropic doesn’t deny that there’s a risk in launching the new 3.5 Sonnet. But the company says the benefits of observing how the model is used in the wild ultimately outweigh these risks.

“We believe it is far better to provide access to computers for today’s relatively limited and more secure models,” the company wrote. “This means we can start monitoring and learning from any potential issues that arise at this lower level, increasing computer usage and mitigating safety impacts gradually and simultaneously.”

Anthropic also says it has taken steps to deter abuse, such as not training the new 3.5 Sonnet on users’ screenshots and prompts, and preventing the model from accessing the web during training. The company says it has developed rating tools to “nudge” 3.5 Sonnet away from actions seen as high-risk, such as posting on social media, creating accounts, and interacting with government websites.

As the US general election approaches, Anthropic says it is focused on mitigating abuse of its election models. The US AI Safety Institute and the UK Safety Institute, two separate but allied government agencies dedicated to assessing the risks of AI models, tested the new 3.5 Sonnet before it was deployed.

Anthropic told TechCrunch that it has the ability to restrict access to additional websites and features “if necessary,” to protect against spam, fraud, and misinformation, for example. As a safety precaution, the company retains any screenshots taken by Computer Use for at least 30 days — a retention period that may concern some developers.

We asked Anthropic under what circumstances, if any, it would turn over screenshots to a third party (e.g., law enforcement) if requested to do so, and will update this post if we receive a response.

“There are no foolproof methods, and we will continually evaluate and iterate our safety measures to balance Cloud’s capabilities with responsible use,” Anthropic said. “Those using the PC version of Cloud should take precautions to reduce these types of risks, including isolating Cloud from particularly sensitive data on their computers.”

Hopefully that will be enough to prevent the worst from happening.

Cheaper model

Today’s headline may have been the upgraded Sonnet 3.5 model, but Anthropic also said an updated version of the Haiku, the cheapest and most efficient model in the Claude series, is on the way.

The Claude 3.5 Haiku, scheduled for release in the coming weeks, will match the performance of the Claude 3 Opus, previously Anthropic’s high-end model, in certain benchmarks at the same cost and “rough speed” of the Claude 3 Haiku.

“With faster speeds, improved instruction following, and more precise tooling, Claude 3.5 Haiku is well-suited for user-facing products, specialized sub-agent tasks, and generating personalized experiences from huge amounts of data like purchase history, pricing, or inventory data,” he wrote. Anthropic in a blog post.

3.5 Haiku will be available initially as a text-only form and later as part of a multimedia package that can analyze both text and images.

So, once 3.5 Haiku is available, will there be much reason to use 3 Opus? What about the 3.5 Opus, the successor to the 3 Opus, which Anthropic teased in June?

“All models in the Claude 3 model family have their individual uses for customers,” an Anthropic spokesperson said. “Claude 3.5 Opus is on our roadmap and we will be sure to share more as soon as we can.”

[ad_2]

Anthropic’s new AI model can control your computer

Application automation

Risky business

Cheaper model

Leave a Comment Cancel reply