ChatGPT can now read some Mac desktop applications

[ad_1]

OpenAI’s ChatGPT starts working with other applications on your computer.

The ChatGPT desktop app for MacOS can now read code in a handful of developer-focused programming applications, such as VS Code, Xcode, TextEdit, Terminal, and iTerm2, the startup announced Thursday.

This means that developers will no longer have to copy and paste their code into ChatGPT, which has become a common way to use a chatbot. Now, when the feature is enabled, OpenAI will automatically send the section of code you’re working on through its chatbot as context, along with your prompt.

However, unlike popular AI coding tools like Cursor or GitHub Copilot, ChatGPT is currently unable to write code directly into developer applications for you.

The feature, called “Working with Applications,” is far from being an AI agent, but OpenAI says that making ChatGPT understand other applications is a “fundamental building block” toward building agent systems. One of the biggest challenges facing AI agents today is getting them to understand the rest of your computer screen, rather than prompts or their own responses.

OpenAI says it’s focusing this feature on programming applications to get started; This is likely because AI coding assistants have become one of the most popular use cases for LLMs. The feature is available to Plus and Teams users today, and will be rolled out to Enterprise and Edu in the next few weeks. OpenAI says ChatGPT will be able to work with other types of applications going forward, specifically, text-based applications that can be used for writing tasks.

You can now select a few programming applications for chatgpt to work with (Image: OpenAI)

In a demo with TechCrunch, an OpenAI employee opened a ChatGPT application and an Xcode environment containing a simple solar system modeling project — though it was missing Earth. The employee selected the Xcode tab within ChatGPT, which tells the AI ​​chatbot to look at the application, and ask the chatbot to “add missing planets.” The chatbot was able to complete the task, writing a line of code to represent the land that matched the rest of the project format. However, they still have to paste the ChatGPT answer back into their environment.

In order to read different applications, OpenAI mostly relies on the MacOS Accessibility API to read text and translate it to ChatGPT, according to OpenAI desktop product lead Alexander Embiricos. The macOS screen reader, which helps Apple’s VoiceOver feature work, has been around for nearly two decades. It is generally considered very reliable for most common applications, but not all.

For some applications, such as Microsoft’s VS Code, working with the applications requires users to install a special extension to query the content. As the name suggests, Apple’s screen reader can only read text, so it can’t help ChatGPT understand visual elements — such as images, the orientation of objects, or videos.

Handle applications by sending the last 200 lines of code via ChatGPT along with each prompt for specific applications. For others, all the code in the main window will be used as input for the chatbot. You can highlight sections of code or text to help ChatGPT focus on the right part of the project, but ChatGPT will also include the text surrounding it. All of this looks like it would use a lot of input codes.

Chatgpt works with xcode (Image: OpenAI)

It’s unclear how OpenAI plans to roll out this feature to other apps that aren’t compatible with Apple’s screen reader. Anthropic, an OpenAI competitor, has released an AI system that analyzes user desktop screenshots to understand and use other applications. To be honest, the Anthropic approach leaves a lot to be desired in its current state: it is slow and makes a lot of mistakes. However, it is a more general version of an AI agent that does not rely on APIs, and can do more than just read text in another window.

“This is not meant to be a proxy, but rather a way to collaborate with programming tools to get started, and there will be more tools soon,” Empiricus said in a press conference with TechCrunch. “On the agent side, I think that’s a really fundamental building block. This idea is that ChatGPT understands all the content you have or can work with so it can help with it.

This move toward agents is particularly notable given recent reports that OpenAI is close to launching a general-purpose AI agent, codenamed “Operator,” according to Bloomberg. The tool is expected to arrive in early 2025, and will rival other early attempts at general-purpose AI agents, such as Anthropic’s use of computers or Reported Google Agent Jarvis.

OpenAI is debuting these features on MacOS, shortly before Apple launches integration with ChatGPT in December. It’s not clear when working with apps will come to Windows, the operating system created by Microsoft, OpenAI’s biggest backer.

[ad_2]

Leave a Comment