We gave our phone system a brain: building an AI voice receptionist with LiveKit + Twilio

Retail phones are a lose-lose. During a rush, staff choose between the customer in front of them and the one on the line. After hours, calls go to a voicemail nobody checks. So we built an AI receptionist that answers every call across three store locations — and it's been on the phones ever since. This is the build log. The case study version lives here.

The architecture

Three pieces:

Twilio owns the phone numbers and SIP trunking — calls hit Twilio first
LiveKit bridges telephony into a realtime session and runs the agent framework
A realtime speech model does the actual conversation — speech in, speech out, no text-in-the-middle pipeline

The agent knows each location's hours, address, parking quirks, and the twenty questions that make up most call volume. Anything outside its lane — a complaint, a negotiation, anything with heat — it transfers to a human with context.

The routing trick that simplified everything: a catch-all dispatch rule. Every inbound number lands on the same agent, which adapts by store. One agent to maintain, three stores covered, new location = new number pointed at the same place.

Lesson 1: latency is the product

Nothing else matters if the turn-taking feels wrong. Humans notice pauses above roughly 500ms; at a full second, callers start saying "hello?" — and once that happens twice, trust is gone regardless of how smart the answers are.

What moved the needle:

Realtime speech-to-speech instead of STT → LLM → TTS. The classic pipeline stacks three latencies and loses prosody. Speech-native models cut both.
On-device voice activity detection and end-of-turn detection, tuned for retail calls — background music, register noise, two people talking near the phone. Stock thresholds interrupted people mid-sentence; retail callers pause mid-thought ("do you have it in… uh, a size 11").
Ruthless prompt budget. Every instruction token adds to time-to-first-word. The persona prompt earns its length or gets cut.

Lesson 2: the boring failure modes are the real ones

The model was never the problem. The problems were:

Managed-number plumbing. Numbers provisioned through certain messaging services behave differently than raw voice numbers when you attach SIP trunks. That mismatch ate a weekend. Check how the number was provisioned before wiring anything.
Knowing when to shut up. Early versions answered everything enthusiastically, including things they shouldn't ("can you hold five pairs for my cousin?"). The fix wasn't more intelligence — it was a tighter lane and a graceful, fast transfer.
The persona test. Read your agent's greeting out loud. If it sounds like a phone tree wearing a costume, rewrite it. Ours is short, warm, and store-specific — callers regularly don't clock it as AI until it tells them.

Lesson 3: scope it like an employee, not a feature

The unlock was writing the agent's job description before its prompt: what would we tell a new hire answering phones on day one? Hours, directions, stock checks, hold policy, when to grab a manager. That document became the system prompt almost verbatim — and gave us the eval checklist for free.

What it costs to run

Orders of magnitude: realtime model minutes + telephony land at cents per call. A receptionist-shaped human answering the same volume across three locations would be a five-figure line item. The ROI conversation is short. The 7-automations post has the missed-call math.

What's next

A public demo line you can call from this site is on the roadmap. Until then, the write-up is the tour — and the phones at all three stores are the production deployment.

Key takeaways

Twilio (telephony) + LiveKit (realtime agent) + speech-to-speech model = an AI receptionist in production
Latency is the product: realtime models, tuned turn detection, and short prompts beat raw intelligence
The failures are boring — number provisioning, scope creep, robotic personas — plan for them
Write the agent's job description like a new hire's; it becomes the prompt and the eval set

Missing calls at your business? We'll scope a voice agent for your operation — honestly, including whether you actually need one. Automations & AI →