Voice to Text for Midjourney
Great Midjourney images come from detailed prompts. But typing out visual descriptions is awkward. You're trying to describe lighting, composition, style, mood, and subject all at once. By the time you've typed it, you've lost half the visual in your head. Blurt lets you speak your prompts naturally. Hold a button, describe what you see in your mind, release. Your complete prompt appears in Discord ready to send. $10/month or $99/year with a free tier to try it out.
The Typing Problem
Visual concepts are easier to describe than type
You have a vivid image in your mind. A weathered lighthouse at sunset with dramatic storm clouds and golden hour lighting. Typing that out piece by piece breaks the mental image. Each word you hunt for on the keyboard fragments the vision. Speaking preserves the complete picture. You describe the whole scene as you see it, and Blurt captures every detail.
Prompt iteration becomes tedious fast
Midjourney is an iterative process. You generate, evaluate, then modify your prompt. Maybe the lighting needs to be warmer. The composition should be wider. The style more painterly. Each iteration means retyping similar prompts with small changes. Voice lets you riff on variations quickly. Same base description, different endings. Ten variations in two minutes instead of ten.
Complex style descriptions are exhausting to type
You want that specific aesthetic. Cinematic lighting, 35mm film grain, shallow depth of field, bokeh in the background, color grading like a Wes Anderson film, aspect ratio 16:9, high detail, photorealistic rendering. That's 30 seconds of speaking or 3 minutes of typing. Your creative momentum dies somewhere around 'shallow depth of field.'
Discord's input field isn't built for long prompts
You're crafting a 100-word prompt in Discord's chat field. It's cramped. You can barely see what you've written. Typos pile up because the context is so limited. With voice, you speak the entire prompt without worrying about the tiny text box. The words flow, then appear. No squinting at a single line of text.
Reference descriptions require precision that typing slows down
You're trying to describe a specific aesthetic from a reference image. The way the shadows fall, the color palette, the texture of the materials, the composition of elements. Translating visual observations to written prompts is already hard. Doing it one keystroke at a time makes it nearly impossible to maintain the connection between what you see and what you're describing.
How It Works
Blurt works wherever you type Midjourney prompts. Discord, the Midjourney web interface, or any text field. Hold your hotkey, describe your image, release.
Focus on the prompt input
Click into Discord's message box or the Midjourney web prompt field. Type /imagine if needed.
Hold your hotkey and describe
Press your chosen key. Speak your complete visual description naturally. Include subject, style, lighting, mood, and any parameters you want.
Release and generate
Your spoken words appear as text. Hit enter to send. Midjourney starts generating while you're already thinking about the next iteration.
Real Scenarios
Detailed scene descriptions that would take forever to type
You want an epic fantasy landscape. 'Ancient elven city carved into a mountain cliff at golden hour, waterfalls cascading between crystalline towers, mist rising from the valley below, birds circling in the warm light, style of Alan Lee and John Howe, matte painting, cinematic composition, 16:9 aspect ratio.' Spoken in 20 seconds. That same prompt typed? Two minutes minimum, and you'd probably forget half the details.
Rapid-fire prompt variations for style exploration
You have a base concept and want to explore different styles. 'Portrait of a cyberpunk hacker, neon lighting, anime style.' Now vary it. 'Same concept, but oil painting style.' 'Now as a 1950s pulp magazine cover.' 'Now hyper-realistic photography.' Voice lets you iterate at the speed of ideas. Each variation takes seconds, not minutes.
Complex technical parameters without memorization
You want specific settings. Aspect ratio, stylize value, chaos level, version number. 'Futuristic motorcycle design, sleek aerodynamic, chrome and matte black, studio lighting, --ar 16:9 --s 750 --c 20 --v 6.' Speaking the parameters feels natural. Typing them means constant reference checking and typo correction.
Describing reference images in real-time
You're looking at a photo that captures the mood you want. Eyes on the reference, you speak: 'Warm afternoon light coming through dusty windows, soft shadows, faded wallpaper, vintage furniture, melancholic atmosphere, muted color palette with hints of gold and amber.' Your description stays connected to what you're seeing because you never look away to type.
Character design iterations
You're developing a character across multiple generations. 'Fantasy warrior, female, ornate armor with nature motifs, flowing red hair, determined expression, forest background.' Next: 'Same character, now in winter setting, different pose, armor covered in frost.' The character stays consistent because you can describe variations quickly without losing the thread.
Batch prompting for concept exploration
You need 20 variations for a client presentation. Different angles, different lighting, different moods of the same core concept. With voice, you speak each variation in succession. Your creative flow stays unbroken. The Discord queue fills up while you're still in the zone.
Late-night creative sessions without wrist strain
It's midnight and you're deep in a creative groove. Your hands are tired from hours of work, but the ideas keep coming. Voice input means your creativity isn't limited by physical fatigue. Speak your prompts, rest your hands, keep generating.
Different ways to write Midjourney prompts. Here's how Blurt compares.
| Blurt | Manual Typing | |
|---|---|---|
| Speed for detailed descriptions | Speak 150+ words per minute | Type 40-60 words per minute |
| Visual-to-text translation | Describe what you see without looking away | Eyes constantly shifting to keyboard |
| Prompt iteration speed | Rapid variations, seconds apart | Slow retyping with edits |
| Complex style descriptions | Natural flow of aesthetic terms | Tedious sequential typing |
| Creative momentum | Stays in visual thinking mode | Constantly pulled into typing mode |
| Price | $10/month or $99/year | Free (but slower) |
Frequently Asked Questions
Start Typing Faster Today
Free to try — no credit card required
Download Blurt