Image, video & audio
Generative tools for visuals, sound, and motion. These are usually the 'whoa' moment for new users — and a great way to understand what models can and can't do.
Midjourney
Aesthetic-leading image generator with a distinct house style. Now with a polished web app.
DALL·E
Image generation inside ChatGPT. Good at following detailed prompts and rendering text in images.
Stable Diffusion
Open-weight image model you can run locally or fine-tune. Powers a huge ecosystem of tools.
Flux
Open-weight image model known for prompt adherence and clean compositions.
ElevenLabs
Best-in-class voice synthesis, dubbing, and voice cloning. Natural prosody in dozens of languages.
Suno
Generate full songs — vocals, lyrics, instrumentation — from a text prompt.
Runway
Pro video tools: generative video, editing, motion brush, lip sync. Strong workflow for creators.
Sora
Text-to-video model producing coherent, multi-shot scenes from a single prompt.
Veo
Google's flagship video model. Strong physics, camera control, and length.
Whisper
Open-weight speech-to-text. The standard for transcribing audio at high quality.