SimplyGuest: Real-Time Voice AI Agent for Parking Search & Support
Production conversational voice AI that helps drivers find parking listings, understand how SimplyGuest works, and resolves common issues—grounded in live marketplace tools instead of guessing
Overview
Use Case
Inbound voice customer support for SimplyGuest.com: parking search, listing details, availability checks, token flow questions, and issue resolution with human escalation
Scale
Production deployment with real-time audio streaming and tool-grounded answers from SimplyGuest's live listing system
Timeline
6 weeks MVP to production with ongoing iteration enabled by prompt updates and observability
The Challenge
SimplyGuest serves drivers who need parking quickly—often while commuting. The support experience had to be fast, safe, and accurate, without turning into an expensive call-center bottleneck.
Driving-Safe UX Requirements
Callers can't listen to long explanations or navigate menus—responses must be short, clear, and actionable
Live Inventory Can't Be "Guessed"
Parking availability, pricing, and listing details change frequently. A voice agent that hallucinates even occasionally is worse than no agent
Strict Policy & Safety Constraints
Agent must never share parking owner contact details directly, must be transparent that listings aren't verified, and must communicate token/replacement-token policy consistently
Production Observability
Voice AI needs traceability: what the user said, what the model answered, which tools were called, and why a call failed—without relying on "it sounded fine" demos
The Solution
We built a real-time, production voice AI agent with three core pillars: low-latency telephony streaming, tool-grounded marketplace answers, and end-to-end call observability.
Real-Time Voice Conversation (Telephony → LLM → Telephony)
A bidirectional streaming pipeline enables natural interruption handling and quick responses:
- • Inbound call audio streams via WebSocket from the telephony provider
- • The model responds with native audio output (not just text-to-speech stitched later)
- • Model audio is resampled to telephony-compatible PCM and played back in real time
- • Interruption events clear buffered audio to avoid "talking over" the caller
Tool-Grounded Answers via MCP (Model Context Protocol)
Instead of relying on memory, the agent queries live SimplyGuest tools for anything that depends on current data:
- • Natural language parking search by city/area/landmark
- • Listing detail retrieval for "tell me more about this one"
- • Availability checks for specific listings
- • Review lookups when asked
This ensures responses reflect current marketplace state and reduces hallucination risk.
Policy-Safe Customer Support Behavior
The voice agent was designed with strict conversational guardrails:
- • Short answers (1–2 sentences) optimized for driving callers
- • One question at a time to avoid cognitive overload
- • Never reveal parking owner phone numbers directly
- • Clear transparency: SimplyGuest is a listing platform; users should visit and verify
- • Correct handling of token and replacement-token policy
- • Human escalation path for urgent issues
Full-Call Observability and QA Workflow
Every call produces structured artifacts for debugging, QA, and iteration:
- • transcript.txt: human-readable call timeline
- • events.jsonl: structured events (setup, interruptions, tool calls/results, errors)
- • caller_in.wav + agent_out.wav: separate audio tracks
- • stereo.wav post-processing: caller on left, agent on right—easy to review
- • recording_meta.json: sample rates, durations, and recording notes
This enables fast iteration on prompts, tool schemas, and audio behavior without "black box" risk.
Technical Architecture
AI & Voice Stack
- • Google Gemini Live (native audio) for low-latency conversational speech output
- • Prompted for short, safe, policy-compliant CSR behavior
- • Streaming transcription capture for both caller and agent turns
Telephony & Real-Time Streaming
- • Plivo bidirectional <Stream> WebSocket integration
- • 16kHz PCM (audio/x-l16) streaming for telephony compatibility
- • Real-time audio resampling from model output to telephony sample rate
- • Buffer management + clearAudio on interruption events
Tool Integration Layer
- • SimplyGuest tools exposed via MCP Streamable HTTP
- • Dynamic tool declaration fetching (function schemas) at session start
- • Tool-call logging with latency measurement and error capture
Observability & Recording
- • Per-call directory layout with UTC date partitioning
- • Structured event logs for auditing tool behavior and failures
- • Call recording with timeline alignment (including silence insertion for agent track alignment)
- • Stereo WAV generation for fast human review
Results
Customer Experience
- 72% of callers found a relevant parking listing without needing human escalation
- 45 seconds median time from call start to first relevant parking recommendation
- 4.3/5 average satisfaction rating from post-call surveys (n=500+ calls)
Operational Efficiency
- 68% reduction in repetitive support queries handled by humans (parking search, availability checks, token policy explanations)
- 52% reduction in average handle time for escalated calls (full call transcripts + tool traces eliminate re-investigation)
- 96.8% tool-call success rate with 180ms median latency (marketplace data + review lookups)
Technical Performance
- 240 ms median end-to-end audio response latency (caller speech → agent audio with tool calls)
- 99.7% uptime across production (including tool provider availability)
- 99.2% of calls with complete transcripts, events, and stereo recordings for QA review
Key Takeaways
Tool Grounding Wins in Marketplaces: Any system that answers from "memory" will drift from live inventory. Making tools the default path for availability/details dramatically reduces incorrect answers.
Voice UX Needs Hard Constraints: Short, single-question turns aren't just "nice"—they're essential for driving callers and for reducing conversational failure modes.
Observability is the Difference Between a Demo and Production: Transcripts, tool traces, and stereo recordings make it possible to debug and improve safely—without guessing what happened on a call.
Ready to deploy a production voice agent for your marketplace?
If you want a voice AI system that's grounded in real data, designed for production reliability, and instrumented for continuous improvement—we can help you ship it.
Discuss Your Voice AI Needs