Voice to Text for DALL-E
DALL-E creates better images when you give it detailed descriptions. But typing out visual concepts is frustrating. Your mental image is rich with colors, lighting, textures, and composition - translating that to typed words takes forever. Blurt lets you describe what you see in your mind naturally. Hold a button, speak the scene, release. Your complete visual description appears ready to generate. $10/month or $99/year with a free tier to try it out.
The Typing Problem
Visual concepts are easier to describe aloud than type
You can picture the image perfectly in your head. A sunset over mountains with dramatic clouds, specific lighting, particular color palette. But typing it out word by word feels clunky. Speaking lets you describe the scene as naturally as you'd tell a friend what you're imagining.
Detailed prompts get better results but take forever to type
DALL-E rewards specificity. The difference between 'a cat' and 'a fluffy orange tabby cat sitting on a velvet cushion with soft window light' is enormous. But typing all that detail is tedious. Voice lets you pour out every visual detail in seconds instead of minutes.
Iterative prompting loses momentum when you type
Your first image isn't quite right. You need to adjust the lighting, change the angle, add a detail. Each iteration requires typing another long description. By the third try, you've lost your creative flow. Speaking keeps the experimentation fast and fluid.
Art style descriptions are verbose and specific
You want something in the style of impressionism with visible brushstrokes and soft edges. Or cyberpunk with neon lighting and rain-slicked streets. Or watercolor with bleeding edges and muted tones. These style specifications add up quickly. Voice handles long stylistic descriptions effortlessly.
Composition details require spatial thinking
You're mentally positioning elements: subject in the lower third, horizon line high, light source from the upper left, background elements softly blurred. Describing spatial relationships is natural in speech but awkward to type. Your brain thinks visually - let your voice translate that directly.
How It Works
Blurt sits in your menu bar and works with DALL-E in any browser. Hold your hotkey, describe your image, release. Your visual description appears as text.
Focus on the DALL-E prompt field
Click into the text box where you'd normally type your image description.
Hold your hotkey and describe the scene
Press your chosen key. Speak your complete visual description - subject, setting, lighting, style, colors, composition. Include every detail you're imagining.
Release and generate
Your spoken description appears as text in the prompt field. Hit generate and see your vision come to life.
Real Scenarios
Detailed scene descriptions with multiple elements
You want a complex image: a cozy coffee shop interior with morning light streaming through large windows, exposed brick walls, vintage furniture, steam rising from cups, a person reading in the corner. Speaking lets you describe all these elements in one natural flow, capturing the complete scene faster than you could ever type it.
Specific art style and technique requests
You need an image in a particular style: oil painting with thick impasto brushstrokes in the manner of Van Gogh, with swirling skies and vivid yellows and blues. Or maybe photorealistic with shallow depth of field and golden hour lighting. Voice captures these technical art specifications naturally.
Iterative refinement of generated images
The first result is close but not quite right. You need more dramatic shadows, a warmer color palette, the subject facing a different direction. Speaking your adjustments takes seconds. You can try five variations in the time it would take to type one.
Mood and atmosphere descriptions
You want an image that feels melancholic, with muted colors and soft focus. Or energetic and vibrant with high contrast and dynamic angles. Describing emotional qualities and atmospheric elements flows naturally in speech. Your voice carries the feeling you're trying to create.
Complex character or product descriptions
You need a character with specific features: tall, silver hair swept back, wearing a long dark coat with brass buttons, carrying an ornate walking cane, standing in a foggy Victorian street. Every detail matters. Speaking lets you describe the complete character without losing any specifics.
Reference-based descriptions
You want something that combines elements: the color palette of a Wes Anderson film, the composition of a Renaissance painting, the subject matter of science fiction. Describing these layered references is natural in speech, awkward in typing.
Batch prompt creation for multiple variations
You need several versions of similar images with slight variations. Speaking lets you quickly describe each variation: same scene but daytime, same scene but rainy, same scene but from a higher angle. Your creative session stays in flow.
Different ways to write DALL-E prompts. Here's how Blurt compares.
| Blurt | Manual Typing | |
|---|---|---|
| Speed for detailed prompts | Speak 150+ words per minute | Type 40-60 words per minute |
| Visual descriptions | Natural flow, describe as you visualize | Translate mental images to typed words |
| Iterative refinement | Speak adjustments in seconds | Retype or edit previous prompts |
| Creative flow | Stay in visual thinking mode | Switch between visual and typing modes |
| Art style vocabulary | Speak complex terminology naturally | Look up and type technical terms |
| Price | $10/month or $99/year | Free (but much slower) |
Frequently Asked Questions
Start Typing Faster Today
Free to try — no credit card required
Download Blurt