DeepMind’s Genie 2 can generate interactive worlds that resemble video games

[ad_1]

DeepMind, Google’s artificial intelligence research organization, revealed a model It can create an “endless” variety of playable 3D worlds.

The model, called Genie 2, the successor to the DeepMind-developed Genie model released earlier this year, can generate an interactive scene in real time from a single image and a text description (for example: “Cute human-like robot in the forest”). In this way, it resembles models under development by Fei-Fei Li, World Labs, and Israeli startup Decart.

DeepMind claims that Genie 2 can generate a “wide range of rich 3D worlds,” including worlds in which users can take actions like jumping and swimming using a mouse or keyboard. The model is trained on videos, and is able to simulate object interactions, animations, lighting, physics, reflections, and NPC behavior.

Image credits:Deep Mind

Many Genie 2 simulations look like AAA video games – which may be because the model’s training data contains playbacks of popular titles. But DeepMind, like many AI labs, won’t reveal many details about its data acquisition methods, for competitive reasons or otherwise.

One wonders what the implications are for intellectual property. DeepMind – being a subsidiary of Google – has unrestricted access to YouTube, and Google has previously noted that its terms of service give it permission to use YouTube videos for model training. But is Genie 2 essentially creating unauthorized copies of video games he’s “watched”? This is what the courts decide.

DeepMind says the Genie 2 can create consistent worlds with different viewpoints, such as first-person and isometric viewpoints, for up to a minute, most lasting 10 to 20 seconds.

“Genie 2 intelligently responds to actions taken by pressing keys on the keyboard, correctly selecting and moving the character,” DeepMind wrote in a blog post. “For example, (our model can) detect that arrow keys should move the robot, not trees or clouds.”

Deep Mind Genie 2
Image credits:Deep Mind

Most models like Genie 2 – world models, if you will – can simulate 3D games and environments, but with issues with artifacts, consistency, and hallucinations. For example, Decart’s Minecraft simulator, Oasis, has a low resolution and quickly “forgets” the layout of levels.

However, Genie 2 can remember parts of the simulated scene that were not visible and accurately display them when they become visible again, DeepMind says. (World Labs models can do this too.)

Now, games made with Genie 2 won’t really be as fun, since it will erase your progress every minute or so. For this reason, DeepMind has positioned the model as a more research and creative tool – a tool for prototyping “interactive experiments” and evaluating AI agents.

“With Genie 2’s OTC capabilities, conceptual art and graphics can be transformed into fully interactive environments,” DeepMind wrote. “By using Genie 2 to quickly create rich and diverse environments for AI agents, our researchers can create assessment tasks that the agents would not have seen during training.”

Deep Mind Genie 2
Image credits:Deep Mind

DeepMind says that although Genie 2 is still in its early stages, the lab believes it will be a key component in the development of future AI agents.

Google has poured increasing resources into global modeling research, which promises to be the next big thing in generative AI. In October, DeepMind hired Tim Brooks, who had been heading development of OpenAI’s Sora video generator, to work on global video generation techniques and simulators. Two years ago, the lab poached Tim Rocktachel, known as the “open end.” Experiments With video games like Nethack, from Meta.

[ad_2]

Leave a Comment