Voice BotAI Workflow
A real-time voice assistant: captures the microphone, transcribes speech, generates a response with an LLM, and synthesizes it back to audio. Four manifests, zero glue code.
Prerequisites
Environment variables
OPENAI_KEYInstall
npx radzor@latest recipe add voice-botAI Prompt
“Run `npx radzor@latest add audio-capture speech-to-text llm-completion text-to-speech` to install 4 Radzor components. Then read components/radzor/audio-capture/radzor.manifest.json, components/radzor/speech-to-text/radzor.manifest.json, components/radzor/llm-completion/radzor.manifest.json, components/radzor/text-to-speech/radzor.manifest.json and each component's llm/integration.md. Wire them together to a real-time voice assistant: captures the microphone, transcribes speech, generates a response with an LLM, and synthesizes it back to audio. Four manifests, zero glue code. Use the manifest's inputs (check envVar for required environment variables), outputs (check fields for object shapes), composability (check mapField for field extraction), and actions — don't invent custom interfaces.”
Paste this into Claude Code, Cursor, Windsurf, or any AI coding agent.
Pipeline
AudioCapture
Streams PCM audio from the microphone
SpeechToText
Transcribes speech to text
LLMCompletion
Generates the assistant reply
TextToSpeech
Synthesizes the reply to speech
Scaffolded Code
// npx radzor@latest add audio-capture speech-to-text llm-completion text-to-speech
import { AudioCapture } from "./components/radzor/audio-capture"
import { SpeechToText } from "./components/radzor/speech-to-text"
import { LLMCompletion } from "./components/radzor/llm-completion"
import { TextToSpeech } from "./components/radzor/text-to-speech"
const mic = new AudioCapture({ sampleRate: 16000, channels: 1, codec: "opus" })
const stt = new SpeechToText({ provider: "openai", apiKey: process.env.OPENAI_KEY!, language: "en" })
const llm = new LLMCompletion({ provider: "openai", apiKey: process.env.OPENAI_KEY!, model: "gpt-4o", systemPrompt: "You are a helpful voice assistant." })
const tts = new TextToSpeech({ provider: "openai", apiKey: process.env.OPENAI_KEY!, voice: "alloy" })
mic.on("onSpeechEnd", async () => {
const audioBlob = await mic.stopRecording()
const { text } = await stt.transcribe(audioBlob)
const { content } = await llm.complete(text)
const audioBuffer = await tts.synthesize(content)
playAudio(audioBuffer)
await mic.startRecording()
})
await mic.startRecording()Components used
LLM tip
Pass all 4 radzor.manifest.json files to your agent at once. It will read the outputs of each step and match them against the inputs of the next — wiring the full pipeline without any extra instructions.