February 2025
AutogenAI ideation with voice UI
Bid writers often face tight deadlines and significant pressure to craft compelling, thematic responses that align with client requirements. Traditional brainstorming methods while valuable can be time-consuming, isolating, and mentally taxing, often resulting in writer’s block or creative stagnation.
To address this, I explored how a voice-first interaction model could accelerate idea generation, reduce friction, and foster creativity through natural dialogue. The goal was to empower bid writers to quickly generate, capture, and organise ideas verbally, streamlining the creative phase of proposal development.
This initiative also aligned with broader goals around accessibility. Voice interaction opens up more inclusive workflows, particularly for users with visual impairments and limited mobility. One of our enterprise clients in Australia was particularly interested in exploring voice UI to assist their dyslexic employees. Internally we also wanted to test how well our current infrastructure could support a combinate of voice and graphical user interface.
Working alongside an AI researcher and prompt engineer, we set out to build a technical proof of concept focused on voice-led ideation.
We first mapped out potential areas of the app that could benefit from voice interaction. After evaluating multiple use cases from compliance analysis to document editing we concluded that ideation in the bid writing first draft stage was the ideal entry point. This stage naturally lends itself to a back-and-forth dialogue and mirrors the dynamic of working with a helpful colleague.
By focusing on ideation, we were able to contain the scope while building something that felt intuitive, and creatively supportive.
Before committing to a full voice-led proof of concept, I conducted a series of experiments using off-the-shelf LLMs, primarily ChatGPT and Google Gemini—to explore how effectively they could support ideation in natural language.
These early sessions helped answer several key design questions:
How do current models respond to vague or incomplete prompts?
Can they maintain a creative thread across multiple turns?
What kinds of prompts elicit more original thinking versus repetitive or shallow responses?
How well do they handle iterative expansion, synthesis, or reframing of ideas?
By interacting with these systems across a variety of bid-related themes I was able to identify:
Strengths: ability to generate broad thematic ideas quickly, rephrase on request, and shift tone with clear promptin.
Limitations: tendency to over-summarise, surface-level responses when context is missing, and difficulty managing ambiguity without structured cues
This exploration gave me early insight into how ideation could be scaffolded, and where intervention would be needed—particularly around tone, prompt design, and keeping the conversation anchored to the user’s specific bid.
These learnings directly informed the conversational flow and system prompts I later designed for the voice assistant, ensuring that we combined the creativity of generative AI with the intentionality of guided interaction.
One of the things I wanted to ensure was that the AI would not act as a dominant generator of ideas, but as a collaborative thinking partner guiding, prompting, and organising without overtaking..
Key capabilities I identified were:
• Asking open-ended questions to stimulate fresh thinking
• Offering constructive prompts to deepen or reframe ideas
• Capturing ideas quickly and accurately as they were spoken
• Enabling playback of idea history for reflection
• Automatically organising related concepts into clusters
• Supporting hands-free editing and selection through voice
This approach positioned AI as a creative amplifier augmenting human ideation, not replacing it.
While the system was powered by a large language model (LLM), we knew that effective user experience couldn’t rely solely on open-ended AI responses. LLMs can wander, overtalk, or respond with inconsistent structure so to address this, we took a hybrid approach—combining the generative flexibility of the model with a designed conversational flow that grounded the interaction in a clear, purposeful arc.
I mapped out a flow that mirrored a natural brainstorming session:
1. Framing the core problem
2. Adding relevant context
3. Generating and exploring ideas
4. Expanding promising directions
5. Structuring responses for inclusion in a bid
Each step was supported by carefully written system prompts and user-facing copy. These prompts were authored to feel natural in voice, while subtly nudging the LLM to behave in specific ways: asking follow-up questions, summarising progress, offering options, or encouraging reflection.
This conversational scaffolding served several functions:
• Shaped the role of the AI from passive responder to active facilitator
• Maintained conversational coherence over multiple turns
• Balanced flexibility and focus, allowing users to explore ideas freely while staying on track
• Ensured tone and pacing were aligned with the emotional and cognitive needs of the user
By designing the system persona and flow in this way, we created a dialogue that felt more like collaborating with a thoughtful colleague than interacting with a generic chatbot. The assistant encouraged creativity, offered structure when needed, and gave users just enough control to feel supported—but never interrupted.
This design-led approach was essential to ensuring that the LLM’s output aligned with the experience we envisioned, and that every prompt the user heard had intention behind it.
I designed a lightweight prototype in Figma that focused on three key modes:
• Listening (active voice input)
• Responding (AI speaking back)
• Reflecting (reviewing transcript, editing ideas)
The design allowed for transcript capture, voice playback, and quick options to revisit, expand, or park ideas. We tested this interface with a client to observe how users moved between listening and reviewing and iterated based on early feedback around pacing and transcript clarity.
Designing the Agent Persona
One of the most critical components was defining the personality and behaviour of the voice assistant. Because users would be speaking out loud and thinking in real time, the assistant needed to strike a careful balance: supportive but not chatty, structured but not rigid, helpful but not overbearing.
I designed the agent to behave like a thoughtful brainstorming partner—closer to a coach than a collaborator. Its personality was guided by these core traits:
Calm & Grounded to create a sense of psychological safety for speaking ideas aloud
Encouraging, but not overly positive to validate user contributions without sounding fake or robotic
Curious to prompt deeper thinking through open-ended follow-ups
Concise to keep dialogue efficient and avoid overwhelming the user
Reflective able to summarise previous ideas and guide user reflection
I wrote a set of bespoke system prompts and utterances designed to model this persona. These included:
• Acknowledgement turns:
“That’s an interesting direction. Want to explore it further or hear some variations?”
• Encouraging turns:
“You’re onto something—rough ideas are often the start of great responses.”
• Turntaking support:
“Would you like to expand this idea, or move on to something new?”
• Soft pivots:
“If you’re not sure where to go next, we can try reframing the problem.”
Voice output was implemented using ElevenLabs, and we tuned the tone to sound warm, articulate, and confident without slipping into artificial enthusiasm. We explored subtle variations in pacing and inflection to match different phases of the ideation journey—slightly slower for summaries, more upbeat for idea generation prompts.
The engineering team built an internal beta of the voice ideation agent, integrating the conversational flow, speech input/output, and AI response system. We captured feedback via a combination of transcripts, call recordings, and structured surveys, with a focus on perceived usability, speed, and quality of output.
I led the design, user testing, and iteration cycle across multiple prototypes.
To validate the impact of voice interaction on ideation, we designed a controlled experiment with three core hypotheses and tested this with 15 of our exisiting clients giving them beta access to the functionality:
H1: Voice input will significantly enhance the speed and ease of ideation compared to traditional text-based input.
Outcome:
Participants generated a higher number of distinct ideas in less time when using voice. On average, users reached their first viable idea faster than in comparable text sessions. Many noted that speaking aloud allowed ideas to flow more freely and reduced self-censorship.
“I got more ideas out in a shorter time because I wasn’t overthinking how to phrase them.”
H2: Users will perceive voice input as a more intuitive and less cognitively demanding way to interact with the AI.
Outcome:
Most users reported that voice felt more natural and “human” than typing. Participants described the experience as “like talking through ideas with a colleague,” and reported lower cognitive load when speaking compared to typing into a structured UI.
70% of users described the voice mode as “easy” or “very easy” to use, compared to 45% for the standard typed input.
Users particularly appreciated the system’s ability to keep the flow going through prompts, even when they momentarily lost focus or stumbled mid-idea.
H3: The quality of ideas generated through voice input will be comparable to—or better than—those produced via text.
Outcome:
Reviewers rated ideas generated through voice slightly higher in originality and breadth. Voice sessions often included more exploratory, lateral ideas early in the process—suggesting that verbal brainstorming encouraged a more divergent thinking style.
“The ideas were rougher, but more interesting. I wouldn’t have typed half of those things.” — Participant feedback
Additional Learnings
Users who were initially sceptical about voice input became enthusiastic after experiencing the structured flow and supportive prompts, especially when paired with summarisation and playback features.
Reservations about speaking aloud were common, particularly in open-plan or shared office environments. Some participants expressed self-consciousness, noting that voice interaction still feels unnatural in professional settings.
“I liked the tool, but I’d feel weird talking to my screen in the middle of the office.”
“It felt like a private conversation, which was good—but I’d only use it with headphones or when working from home.”
Despite these reservations, users recognised the value of speaking to generate ideas, especially for tasks involving messy, early thinking. Some even noted that verbalising thoughts helped unlock different ideas than typing did.
There was strong appreciation for voice playback and transcript capture, especially among users who tend to lose their train of thought mid-ideation.
Several participants requested multi-modal controls in future versions e.g., the ability to speak to generate ideas, then refine or tag them using text, which has since informed next-phase design decisions.