AI
The AI Compendium
a Build Labs guide
Shelf 03

Image, video & audio

Generative tools for visuals, sound, and motion. These are usually the 'whoa' moment for new users — and a great way to understand what models can and can't do.

Midjourney

Midjourney

Image

Aesthetic-leading image generator with a distinct house style. Now with a polished web app.

Best for
Mood, concept art, marketing imagery.
ImageStylized
OpenAI

DALL·E

Image

Image generation inside ChatGPT. Good at following detailed prompts and rendering text in images.

Best for
Quick illustrations from inside a chat.
ImageText-in-image
Stability AI

Stable Diffusion

Image

Open-weight image model you can run locally or fine-tune. Powers a huge ecosystem of tools.

Best for
Custom pipelines and offline use.
ImageOpenSelf-host
Black Forest Labs

Flux

Image

Open-weight image model known for prompt adherence and clean compositions.

Best for
When you want exactly what you asked for.
ImageOpenPrompt-adherent
ElevenLabs

ElevenLabs

Audio

Best-in-class voice synthesis, dubbing, and voice cloning. Natural prosody in dozens of languages.

Best for
Podcasts, narration, agents that talk.
VoiceCloningDubbing
Suno

Suno

Audio

Generate full songs — vocals, lyrics, instrumentation — from a text prompt.

Best for
Demos, jingles, and creative play.
MusicVocals
Runway

Runway

Video

Pro video tools: generative video, editing, motion brush, lip sync. Strong workflow for creators.

Best for
Filmmakers and motion designers.
VideoEditing
OpenAI

Sora

Video

Text-to-video model producing coherent, multi-shot scenes from a single prompt.

Best for
Concept films and ideation reels.
VideoText-to-video
Google

Veo

Video

Google's flagship video model. Strong physics, camera control, and length.

Best for
Longer, more controlled video shots.
VideoCinematic
OpenAI

Whisper

Audio

Open-weight speech-to-text. The standard for transcribing audio at high quality.

Best for
Transcription pipelines.
STTOpen