[ad_1]
Global models, also known as global simulators, are touted by some as the next big thing in AI.
The world’s leading AI lab, Fei Fei Li, has raised $230 million to build “massive world models,” and DeepMind has hired one of the creators of OpenAI’s video generator, Sora, to work on “world simulators.” (Sora was released on Monday, here are some early impressions.)
But what the heck We are These things?
Universal models are inspired by the mental models of the world that humans naturally develop. Our brains take the abstract representations of our senses and shape them into a more realistic understanding of the world around us, producing what we call “models” long before artificial intelligence adopted the phrase. The predictions our brains make based on these models affect how we perceive the world.
A paper Artificial intelligence researchers David Ha and Jürgen Schmidhuber give the example of a baseball bat. Hitters have fractions of a second to decide how to swing their bat, which is shorter than the time it takes visual signals to reach the brain. Ha and Schmidhuber say the reason they can hit a 100 mph fastball is because they can instinctively predict where the ball is going.
“For professional players, all of this happens unconsciously,” the research duo wrote. “Their muscles reflexively swing the racket at the right time and place in line with the expectations of their internal models. They can act quickly on their expectations of the future without having to consciously bring up possible future scenarios to form a plan.”
It is the unconscious thinking aspects of universal paradigms that some believe are prerequisites for human-level intelligence.
Modeling the world
Although this concept has been around for decades, universal models have recently gained popularity in part because of their promising applications in the field of generative video.
Most, if not all, AI-generated videos veer into uncanny valley territory. Watch them long enough and something strange It will happen like limbs twisting and merging into each other.
While a generative model trained on years of video may accurately predict that a basketball will bounce, it actually has no idea why — just like language models don’t really understand the concepts behind words and phrases. But a global model that has even a basic understanding of why a basketball bounces the way it does will be better at showing that it does.
To enable this kind of insight, global models are trained on a range of data, including images, audio, videos and text, with the aim of creating internal representations of how the world works, and the ability to reason about the consequences of actions. .
“The viewer expects that the world they are watching behaves in a similar way to their reality,” said Alex Masherpov, former head of artificial intelligence at Snap and CEO of Higgsfield, which builds generative models for video. “If an anvil-weighted feather falls, or if a bowling ball goes hundreds of feet in the air, it’s jarring and takes the viewer out of the moment. With a strong world model, instead of the creator specifying how each object is expected to move — which is boring, cumbersome, and bad Use of time – the model will understand this.
But generating better video is just the tip of the iceberg for global models. The models could one day be used for advanced forecasting and planning in the digital and physical domains, say the researchers, including Yann LeCun, chief artificial intelligence scientist at Meta.
In a talk earlier this year, Lacon described how the global model can help achieve the desired goal through logic. A model that has a basic representation of the “world” (e.g., a video of a dirty room), given a goal (a clean room), can come up with a series of actions to achieve that goal (deploy the vacuum cleaner to sweep, do the dishes, empty a basket Trash) not because that’s a pattern you’ve noticed but because he knows on a deeper level how to go from dirty to clean.
“We need machines that understand the world; “(Machines) that can remember things, that have intuition, that have common sense — things that can think and plan at the same level as humans,” LeCun said. “Despite what I’ve heard from some of the most enthusiastic people, current AI systems are incapable of doing any of this.”
While LeCun estimates that we are at least a decade away from the global models he envisions, current global models look promising as simulators of elementary physics.

OpenAI points out in a blog post that Sora, which it considers a universal model, can simulate actions like a painter leaving brush strokes on a canvas. Models like Sora – and Sora itself – can do this effectively as well simulation video games. For example, Sora can display a user interface and game world similar to Minecraft.
Future world models may be able to create custom-made 3D worlds for gaming, virtual photography, and more, World Labs co-founder Justin Johnson said at a press conference. episode From the a16z podcast.
“We already have the ability to create virtual, interactive worlds, but it costs hundreds and hundreds of millions of dollars and tons of development time,” Johnson said. “(The world models) will not only allow you to get a photo or a clip, but they will allow you to simulate a fully lifelike, interactive 3D world.”
High hurdles
While the concept is tempting, several technical challenges stand in the way.
Training and running global models requires enormous computational power even compared to the amount currently used in generative models. While some of the latest language models can run on a modern smartphone, Sora (arguably an early global model) will require thousands of GPUs to train and run, especially if its use becomes common.
Universal models, like all AI models, also hallucinate, absorbing biases into their training data. A global model that was largely trained on videos of sunny weather in European cities might have difficulty understanding or photographing Korean cities in snowy conditions, for example, or simply do it incorrectly.
A general lack of training data threatens to exacerbate these problems, says Meshrapov.
“We’ve seen the models really limited to generations of people of a particular gender or race,” he said. “The training data for the global model must be broad enough to cover a variety of scenarios, but also very specific so that the AI can deeply understand the nuances of those scenarios.”
Recently mailData and engineering issues prevent current models from accurately capturing the behavior of global populations (such as humans and animals), says Cristobal Valenzuela, CEO of AI startup Runway. “Models will need to create consistent maps of the environment, and the ability to navigate and interact in those environments,” he said.

However, if all the major hurdles are overcome, Meshrapov believes that universal models could “more strongly” connect AI to the real world – leading to breakthroughs not only in virtual world generation, but also in robotics and intelligence decision-making. Artificial.
They can also produce more capable robots.
Robots today are limited in what they can do because they have no awareness of the world around them (or their bodies). Universal models can give them that awareness, at least to some extent, Meshrapov said.
“With an advanced global model, AI can develop a personalized understanding of any scenario it is placed in, and start thinking of possible solutions,” he said.
TechCrunch has an AI-focused newsletter! Register here Get it in your inbox every Wednesday.
[ad_2]