Amazon announces Nova, a new family of multimedia AI models

[ad_1]

At the re:Invent conference on Tuesday, Amazon Web Services (AWS), Amazon’s cloud computing division, announced a new family of multimodal generative AI models it calls Nova.

There are four text generation models in total: Micro, Lite, Pro, and Premier. Micro, Lite, and Pro are available Tuesday for AWS customers, while Premier will arrive in early 2025, Amazon CEO Andy Jassy said on stage.

In addition, there is an image generation template, Nova Canvas, and a video generation template, Nova Reel. Both also launched on AWS this morning.

“We’ve continued to work on our own frontier models, and those frontier models have made a tremendous amount of progress over the last four to five months,” Jassy said. “We thought if we were finding value in them, you would probably find value in them.”

Micro, Lite, Pro, and Premier

Nova’s text-generating models, namely optimum For 15 languages ​​(but primarily English), they have widely varying sizes and capabilities.

The Micro can only ingest text and output text, but it offers the lowest latency of the group — processing text and generating responses faster.

The Lite can process image, video, and text inputs reasonably quickly. The Pro offers a balanced combination of accuracy, speed, and cost for a range of tasks. The Premier is the most capable, designed for complex workloads.

Pro and Premier, like Lite, can analyze text, images, and video. All three programs are well suited for tasks such as understanding documents, summarizing charts, meetings, and graphs. However, AWS positions Premier as a “feature” model for creating fine-tuned custom models, rather than as a model that can be used on its own.

The Micro has a context window of 128,000 characters, which means it can handle up to about 100,000 words. Lite and Pro have 300,000 contextual windows, which amounts to about 225,000 words, 15,000 lines of computer code, or 30 minutes of footage.

AWS says that in early 2025, the context windows of some Nova models will expand to support more than 2 million tokens.

Jassy claims the Nova models are among the fastest in their class – and among the least expensive to run. It is available on AWS Bedrock, Amazon’s AI development platform, where it can be fine-tuned on text, images, and video and distilled for improved speed and higher efficiency.

“We’ve optimized these models to work with proprietary systems and APIs, so you can perform multiple coordinated automated steps — agent behavior — more easily using these models,” Jassy added. “So I think these are very compelling.”

Canvas and reel

Canvas and Reel are AWS’s strongest offerings to date for generative media.

Canvas allows users to create and edit images with prompts (for example, to remove backgrounds) and provides controls for the color schemes and layouts of the images created. Reel, the more ambitious of the two models, creates videos up to six seconds long from prompts or optionally reference images. With Reel, users can adjust camera movement to create videos using pans, 360-degree rotation, and zoom.

Reel is currently limited to six-second videos (which take about three minutes to create), but a version that can create two-minute videos is “coming soon,” according to AWS.

Here is a sample:

Image credits:Os

And another:

OS Nova Real
Image credits:Os

Here are pictures of the canvas:

AWS Nova Canvas
Canvas can create images in a range of styles, AWS says, expanding existing images or inserting objects into scenes. Image credits:Os

Jassy stressed that both Canvas and Reel have “built-in” controls for responsible use, including watermarks and content moderation. “(We are trying to) limit the generation of harmful content,” he said.

AWS has expanded security measures in a Blog postSaying that Nova “expands its safety measures to combat the spread of misinformation, child sexual abuse material and chemical, biological, radiological or nuclear risks.” However, it is not clear what this means in practice, or what forms these measures take.

AWS also remains vague about exactly what data it uses to train all of its generative models. The company previously only told TechCrunch that it is a mix of private and licensed data.

Few sellers willingly disclose such information. They view training data as a competitive advantage, and therefore keep it – and the information related to it – a closely guarded secret. Details of training data are also a potential source of intellectual property lawsuits, which is another barrier to revealing too much.

Instead of transparency, AWS offers a compensation policy that covers customers if one of its models regurgitates (i.e., spits out an exact copy of) a potentially copyrighted image.

So, what’s next for Nova? Jassy says AWS is working on a speech-to-speech model — a model that will receive speech and output a converted version of it — for the first quarter of 2025, and an “anyone-to-anyone” model for around mid-2025.

Reinventing AWS: Inventing 2024 Nova
Image credits:Frederic Lardinois/TechCrunch

Amazon says the speech-to-speech model will also be able to interpret verbal and non-verbal cues, such as pitch and rhythm, and deliver natural, “human-like” voices. As for the “any-to-anyone” model, it would theoretically power applications from translators to content editors to AI assistants.

This is assuming she doesn’t suffer any setbacks, of course.

“You’ll be able to input text, speech, images or video and output text, speech, images or video,” Jassy said of the Anything to Anyone model. “This is the future of how frontier models are built and consumed.”

[ad_2]

Leave a Comment