Intermediateaiaudiorealtime

Voice BotAI Workflow

A real-time voice assistant: captures the microphone, transcribes speech, generates a response with an LLM, and synthesizes it back to audio. Four manifests, zero glue code.

Prerequisites

Environment variables

OPENAI_KEY

Browser environment only — requires HTTPS and microphone permission.

Install

$npx radzor@latest recipe add voice-bot

AI Prompt

“Run `npx radzor@latest add audio-capture speech-to-text llm-completion text-to-speech` to install 4 Radzor components. Then read components/radzor/audio-capture/radzor.manifest.json, components/radzor/speech-to-text/radzor.manifest.json, components/radzor/llm-completion/radzor.manifest.json, components/radzor/text-to-speech/radzor.manifest.json and each component's llm/integration.md. Wire them together to a real-time voice assistant: captures the microphone, transcribes speech, generates a response with an LLM, and synthesizes it back to audio. Four manifests, zero glue code. Use the manifest's inputs (check envVar for required environment variables), outputs (check fields for object shapes), composability (check mapField for field extraction), and actions — don't invent custom interfaces.”

Paste this into Claude Code, Cursor, Windsurf, or any AI coding agent.

Pipeline

AudioCapture

Streams PCM audio from the microphone

→

↓

audio blob

SpeechToText

Transcribes speech to text

→

↓

transcript

LLMCompletion

Generates the assistant reply

→

↓

message

TextToSpeech

Synthesizes the reply to speech

Scaffolded Code

voice-bot-recipe.ts

// npx radzor@latest add audio-capture speech-to-text llm-completion text-to-speech
import { AudioCapture }  from "./components/radzor/audio-capture"
import { SpeechToText }  from "./components/radzor/speech-to-text"
import { LLMCompletion }  from "./components/radzor/llm-completion"
import { TextToSpeech }  from "./components/radzor/text-to-speech"

const mic = new AudioCapture({ sampleRate: 16000, channels: 1, codec: "opus" })
const stt = new SpeechToText({ provider: "openai", apiKey: process.env.OPENAI_KEY!, language: "en" })
const llm = new LLMCompletion({ provider: "openai", apiKey: process.env.OPENAI_KEY!, model: "gpt-4o", systemPrompt: "You are a helpful voice assistant." })
const tts = new TextToSpeech({ provider: "openai", apiKey: process.env.OPENAI_KEY!, voice: "alloy" })

mic.on("onSpeechEnd", async () => {
  const audioBlob = await mic.stopRecording()
  const { text } = await stt.transcribe(audioBlob)
  const { content } = await llm.complete(text)
  const audioBuffer = await tts.synthesize(content)
  playAudio(audioBuffer)
  await mic.startRecording()
})

await mic.startRecording()

Components used

AudioCaptureStreams PCM audio from the microphone

View

SpeechToTextTranscribes speech to text

View

LLMCompletionGenerates the assistant reply

View

TextToSpeechSynthesizes the reply to speech

View

LLM tip

Pass all 4 radzor.manifest.json files to your agent at once. It will read the outputs of each step and match them against the inputs of the next — wiring the full pipeline without any extra instructions.

audio-capture/manifest.json speech-to-text/manifest.json llm-completion/manifest.json text-to-speech/manifest.json

// npx radzor@latest add audio-capture speech-to-text llm-completion text-to-speech import { AudioCapture } from "./components/radzor/audio-capture" import { SpeechToText } from "./components/radzor/speech-to-text" import { LLMCompletion } from "./components/radzor/llm-completion" import { TextToSpeech } from "./components/radzor/text-to-speech" const mic = new AudioCapture({ sampleRate: 16000, channels: 1, codec: "opus" }) const stt = new SpeechToText({ provider: "openai", apiKey: process.env.OPENAI_KEY!, language: "en" }) const llm = new LLMCompletion({ provider: "openai", apiKey: process.env.OPENAI_KEY!, model: "gpt-4o", systemPrompt: "You are a helpful voice assistant." }) const tts = new TextToSpeech({ provider: "openai", apiKey: process.env.OPENAI_KEY!, voice: "alloy" }) mic.on("onSpeechEnd", async () => { const audioBlob = await mic.stopRecording() const { text } = await stt.transcribe(audioBlob) const { content } = await llm.complete(text) const audioBuffer = await tts.synthesize(content) playAudio(audioBuffer) await mic.startRecording() }) await mic.startRecording()