[ad_1]
The well-known test of artificial general intelligence (AGI) is closer to being solved. But the test creators say this points to flaws in the test’s design, not an actual research breakthrough.
In 2019, François Cholet, a leading figure in the world of artificial intelligence, introduced the ARC-AGI standard, an acronym for “Abstract and Reasoning for Artificial General Intelligence.” It is designed to evaluate whether an AI system can efficiently acquire new skills beyond the data it was trained on, ARC-AGIAs François claims, it remains the only AI test to measure progress toward general intelligence (though). Others have been suggested.)
Until this year, the top-performing AIs had solved only a little less than a third of the tasks in ARC-AGI. Cholet blamed the industry’s focus on large language models (LLMs), which he believes are incapable of actual “inference.”
“Master of English students have difficulty generalizing, due to their complete reliance on memorization.” He said In a series of posts on X in February. “They break anything that wasn’t in their training data.”
In Cholet’s view, LLM holders are statistical machines. By training them with lots of examples, they learn the patterns in those examples to make predictions, such as the phrase “to whom” in an email typically precedes the phrase “I might be interested.”
Cholet emphasizes that although LLM students may be able to memorize “patterns of thinking,” they are unlikely to be able to generate “new reasoning” based on new situations. “If you need to train on many examples of a pattern, even an implicit one, in order to learn a reusable representation of it, you are memorizing.” Argue In another post.
To stimulate research beyond the MBA, Mike Knope, co-founder of Schulette and Zapier, in June launched a $1 million project. a race To build an open source AI capable of beating ARC-AGI. Out of 17,789 entries, the top candidates scored 55.5% – nearly 20% higher than the 2023 top scorer, though below the “human level” threshold of 85% required to win.
This doesn’t mean we’re about 20% closer to AGI, Knoop says.
Today we’re announcing the winners of the 2024 ARC Prize. We’re also publishing an extensive technical report on what we learned from the competition (link in the next tweet).
The proportion of new technology rose from 33% to 55.5%, the largest increase we have seen in a single year since 2020.
– François Cholet (@fcholet) December 6, 2024
In a Blog postKnope said that many of the submissions to ARC-AGI were able to “brute force” their way to resolution, suggesting that “a large portion” of ARC-AGI’s assignments “(don’t) carry a lot of useful signals toward general matters.” ”
ARC-AGI consists of puzzle-like problems where the AI, given a grid of squares of different colours, has to generate the correct “answer” grid. Problems are designed to force the AI to adapt to new problems that it has never seen before. But it is not clear that they are achieving this.
“(ARC-AGI) has not changed since 2019 and is not perfect,” Knope admitted in his post.
He also faced François and Knopp cash To oversell ARC-AGI as the standard for AGI – at a time when the definition of AGI is hotly contested. An OpenAI employee recently He claimed That AGI has “already” been achieved if AGI is defined as AI that is “better than most humans at most tasks.”
Knopp and Scholette say they plan to release a second-generation ARC-AGI standard to address these issues, along with the competition, in 2025. “We will continue to direct the efforts of the research community toward what we consider to be the most important unsolved problems in AI, and accelerate the timeline for artificial general intelligence.” , as Cholet wrote in the letter X. mail.
Reforms likely will not come easily. If the shortcomings of the first ARC-AGI test are any indication, defining artificial intelligence will also be difficult. Infections – As it was for humans.
[ad_2]