Voice to Text for DALL-E

DALL-E creates better images when you give it detailed descriptions. But typing out visual concepts is frustrating. Your mental image is rich with colors, lighting, textures, and composition - translating that to typed words takes forever. Blurt lets you describe what you see in your mind naturally. Hold a button, speak the scene, release. Your complete visual description appears ready to generate. $10/month or $99/year with a free tier to try it out.

First 1,000 words free Works in any browser macOS only
Download Blurt Free

The Typing Problem

Visual concepts are easier to describe aloud than type

You can picture the image perfectly in your head. A sunset over mountains with dramatic clouds, specific lighting, particular color palette. But typing it out word by word feels clunky. Speaking lets you describe the scene as naturally as you'd tell a friend what you're imagining.

Detailed prompts get better results but take forever to type

DALL-E rewards specificity. The difference between 'a cat' and 'a fluffy orange tabby cat sitting on a velvet cushion with soft window light' is enormous. But typing all that detail is tedious. Voice lets you pour out every visual detail in seconds instead of minutes.

Iterative prompting loses momentum when you type

Your first image isn't quite right. You need to adjust the lighting, change the angle, add a detail. Each iteration requires typing another long description. By the third try, you've lost your creative flow. Speaking keeps the experimentation fast and fluid.

Art style descriptions are verbose and specific

You want something in the style of impressionism with visible brushstrokes and soft edges. Or cyberpunk with neon lighting and rain-slicked streets. Or watercolor with bleeding edges and muted tones. These style specifications add up quickly. Voice handles long stylistic descriptions effortlessly.

Composition details require spatial thinking

You're mentally positioning elements: subject in the lower third, horizon line high, light source from the upper left, background elements softly blurred. Describing spatial relationships is natural in speech but awkward to type. Your brain thinks visually - let your voice translate that directly.

How It Works

Blurt sits in your menu bar and works with DALL-E in any browser. Hold your hotkey, describe your image, release. Your visual description appears as text.

1

Focus on the DALL-E prompt field

Click into the text box where you'd normally type your image description.

2

Hold your hotkey and describe the scene

Press your chosen key. Speak your complete visual description - subject, setting, lighting, style, colors, composition. Include every detail you're imagining.

3

Release and generate

Your spoken description appears as text in the prompt field. Hit generate and see your vision come to life.

Real Scenarios

Specific art style and technique requests

You need an image in a particular style: oil painting with thick impasto brushstrokes in the manner of Van Gogh, with swirling skies and vivid yellows and blues. Or maybe photorealistic with shallow depth of field and golden hour lighting. Voice captures these technical art specifications naturally.

Iterative refinement of generated images

The first result is close but not quite right. You need more dramatic shadows, a warmer color palette, the subject facing a different direction. Speaking your adjustments takes seconds. You can try five variations in the time it would take to type one.

Mood and atmosphere descriptions

You want an image that feels melancholic, with muted colors and soft focus. Or energetic and vibrant with high contrast and dynamic angles. Describing emotional qualities and atmospheric elements flows naturally in speech. Your voice carries the feeling you're trying to create.

Complex character or product descriptions

You need a character with specific features: tall, silver hair swept back, wearing a long dark coat with brass buttons, carrying an ornate walking cane, standing in a foggy Victorian street. Every detail matters. Speaking lets you describe the complete character without losing any specifics.

Reference-based descriptions

You want something that combines elements: the color palette of a Wes Anderson film, the composition of a Renaissance painting, the subject matter of science fiction. Describing these layered references is natural in speech, awkward in typing.

Batch prompt creation for multiple variations

You need several versions of similar images with slight variations. Speaking lets you quickly describe each variation: same scene but daytime, same scene but rainy, same scene but from a higher angle. Your creative session stays in flow.

Different ways to write DALL-E prompts. Here's how Blurt compares.

Blurt Manual Typing
Speed for detailed prompts Speak 150+ words per minute Type 40-60 words per minute
Visual descriptions Natural flow, describe as you visualize Translate mental images to typed words
Iterative refinement Speak adjustments in seconds Retype or edit previous prompts
Creative flow Stay in visual thinking mode Switch between visual and typing modes
Art style vocabulary Speak complex terminology naturally Look up and type technical terms
Price $10/month or $99/year Free (but much slower)

Frequently Asked Questions

Does Blurt work with DALL-E in ChatGPT and the API playground?
Yes. Blurt inserts text at your cursor position in any application. Whether you're using DALL-E through ChatGPT, the OpenAI API playground, or any third-party interface, Blurt works the same way. It puts your spoken words where your cursor is.
How does it handle art terminology and style names?
Very well. Terms like 'chiaroscuro,' 'impressionism,' 'bokeh,' 'rule of thirds,' and artist names transcribe accurately. Blurt handles the vocabulary of visual art and photography effectively out of the box.
Can I describe aspect ratios and technical specifications?
Yes. You can speak phrases like 'sixteen by nine aspect ratio' or 'square format' and Blurt transcribes them correctly. Technical specifications, resolution requests, and format descriptions all work naturally.
What about punctuation in complex descriptions?
Blurt adds punctuation automatically. Commas between elements, periods between sentences - they appear where they should based on your speech patterns. No need to say 'comma' or 'period' while describing your image.
Is there a free tier to try it out?
Yes. first 1,000 words free, free forever. That's enough for dozens of detailed DALL-E prompts to see if voice input fits your creative workflow. No credit card required to start.
Does Blurt work on Windows or Linux?
Blurt is macOS only. We focused on creating the best possible Mac experience with native menu bar integration and system-level keyboard shortcuts. Windows and Linux versions are not currently available.

Start Typing Faster Today

Free to try — no credit card required

Download Blurt