Shelf 03

Image, video & audio

Generative tools for visuals, sound, and motion. These are usually the 'whoa' moment for new users — and a great way to understand what models can and can't do.

Midjourney

Image

Aesthetic-leading image generator with a distinct house style. Now with a polished web app.

Best for

Mood, concept art, marketing imagery.

ImageStylized

OpenAI

DALL·E

Image

Image generation inside ChatGPT. Good at following detailed prompts and rendering text in images.

Best for

Quick illustrations from inside a chat.

ImageText-in-image

Stability AI

Stable Diffusion

Image

Open-weight image model you can run locally or fine-tune. Powers a huge ecosystem of tools.

Best for

Custom pipelines and offline use.

ImageOpenSelf-host

Black Forest Labs

Flux

Image

Open-weight image model known for prompt adherence and clean compositions.

Best for

When you want exactly what you asked for.

ImageOpenPrompt-adherent

ElevenLabs

Audio

Best-in-class voice synthesis, dubbing, and voice cloning. Natural prosody in dozens of languages.

Best for

Podcasts, narration, agents that talk.

VoiceCloningDubbing

Suno

Audio

Generate full songs — vocals, lyrics, instrumentation — from a text prompt.

Best for

Demos, jingles, and creative play.

MusicVocals

Runway

Video

Pro video tools: generative video, editing, motion brush, lip sync. Strong workflow for creators.

Best for

Filmmakers and motion designers.

VideoEditing

OpenAI

Sora

Video

Text-to-video model producing coherent, multi-shot scenes from a single prompt.

Best for

Concept films and ideation reels.

VideoText-to-video

Google

Veo

Video

Google's flagship video model. Strong physics, camera control, and length.

Best for

Longer, more controlled video shots.

VideoCinematic

OpenAI

Whisper

Audio

Open-weight speech-to-text. The standard for transcribing audio at high quality.

Best for

Transcription pipelines.

STTOpen