A2UI Mortgage Assistant — Voice-First AI with Dynamic UI Generation¶
TL;DR: A voice-first mortgage assistant where the AI doesn't just talk — it builds the UI in real time. Speak your requirements; a LangGraph agent interprets them and assembles an interactive interface (product comparison cards, gauges, maps, timelines) on the fly via a custom A2UI protocol.
Stack: Python • FastAPI • LangGraph • Amazon Nova Sonic • Next.js 16 • React 19 • TypeScript • Tailwind CSS • AWS Bedrock • WebSocket
✨ Features¶
- 🎙️ Voice-First Interaction - Amazon Nova Sonic handles both STT and TTS; the user never touches a form
- 🧩 Agent-Generated UI (A2UI) - A LangGraph agent emits a component tree at runtime; the client renders it dynamically with no hardcoded screens
- 🗺️ Property Insights - Address lookup with an embedded Leaflet map and green mortgage reward when a property is identified
- 📊 Live Product Comparison - Gauge, ProductCard, DataCard, and ComparisonBadge components assembled and patched by the agent
- 🤝 Trouble Detection - Counts unhelpful turns and surfaces a "Speak to a Colleague" button after two consecutive dead-ends
- 🛡️ Guardrail Enforcement - Nova Lite checks every response for off-topic content and substitutes a safe fallback
- 🧪 Goal-Based Test Suite - WebSocket integration tests validate full conversation scenarios end-to-end
🧠 Architecture¶
graph TB
User[🎤 User Voice / Text] --> Client[Next.js Client]
Client -->|PCM16 audio chunks| WS[WebSocket :8000]
WS --> STT[Nova Sonic STT\nnova_sonic_stt.mjs]
STT -->|transcript| Agent[LangGraph Agent\ngraph.py]
Agent --> Interpret[interpret_intent\nNova Lite → MortgageIntent]
Interpret --> Router{intent_router}
Router -->|missing fields| RenderForm[render_missing_inputs\nCategory grid / detail form]
Router -->|all fields present| Tools[call_mortgage_tools\nLTV + amortisation + products]
Tools --> RenderProducts[render_products_a2ui\nGauge + ProductCards + DataCard]
RenderProducts --> Summary[render_summary_a2ui → confirm_application]
Agent -->|UI action| HandleAction[handle_ui_action\nupdate_term / select_product / reset]
HandleAction --> Recalc[recalculate_and_patch]
Agent -->|server.a2ui.patch| Client
Agent --> TTS[Nova Sonic TTS\nnova_sonic_tts.mjs]
TTS -->|base64 PCM16 chunks| Client
subgraph "A2UI Renderer (client)"
Client --> A2R[A2Renderer.tsx\nbuilds component tree from root]
A2R --> Components[Column / Row / Text / Gauge\nProductCard / Map / Button / DataCard]
end
subgraph "AWS Bedrock"
STT
TTS
Interpret
end
🎯 What Makes This Special¶
Agent-Driven UI — No Hardcoded Screens¶
Most voice assistants return text. This one returns a component tree. The LangGraph agent emits server.a2ui.patch messages containing a flat list of typed components with parent–child references. A2Renderer.tsx builds the tree at runtime and renders it. Add a new component type on the server; the client renders it automatically. Remove a screen; the agent simply stops emitting it.
Single Outbox, Coordinated Delivery¶
The agent queues all output — UI patches, voice lines, transcripts — into a state.outbox. process_outbox() always flushes non-voice events first, then batches all speech into a single TTS call. This prevents audio from playing before the UI has updated.
Trouble-Aware Conversation¶
The agent tracks trouble_count: how many turns the user gave no useful information or used struggle keywords. At ≥ 2, show_support flips to true and the frontend surfaces a human escalation button — all without any client-side logic changes.
🚀 Technical Highlights¶
LangGraph State Machine¶
- Nodes:
ingest_input,interpret_intent,render_missing_inputs,call_mortgage_tools,render_products_a2ui,handle_ui_action,recalculate_and_patch,render_summary_a2ui,confirm_application,clear_pending_action - State: Single
AgentStateobject as source of truth —intent,ltv,products,selection,ui,outbox,trouble_count - Routing:
start_routerdispatches on message type (transcript vs UI action);intent_routerbranches on field completeness
Amazon Nova Sonic Integration¶
- STT: Node.js subprocess (
nova_sonic_stt.mjs) receiving PCM16 via stdin, emittingTRANSCRIPT:lines on stdout - TTS: Separate Node.js subprocess (
nova_sonic_tts.mjs) emittingAUDIO_CHUNK:<base64>lines, forwarded as WebSocket binary frames - Fallback: Keyword-based intent extraction when AWS is unavailable
Client Audio Pipeline¶
AudioWorkletNode(PCM16Processor) records at 16 kHz; chunks base64-encoded to server- VAD: auto-stops after 1500 ms silence below 0.015 RMS
AudioStreamerqueues base64 PCM16 chunks on a 24 kHzAudioContext; mic auto-restarts after TTS playback ends
A2UI Component Library¶
Supported types: Column, Row, Text (h1/h2/h3/body), Gauge, ProductCard, Button, Map (Leaflet iframe), Timeline, DataCard, BenefitCard, ComparisonBadge, Image. Unknown types render as a visible error box — no silent failures.
📊 Key Metrics¶
- Latency: STT → agent response → TTS in a single WebSocket round-trip
- Test coverage: Goal-based integration tests for full conversation scenarios (e.g.
GBT-FTB-01first-time buyer flow) - UI components: 13 dynamic component types, zero hardcoded screens
- Models used: Amazon Nova Sonic (voice), Amazon Nova Lite (intent/guardrails), Claude Sonnet/Haiku (A2UI design)
This project demonstrates event-driven agent architecture, voice AI integration at the protocol level, and a novel approach to AI-generated interfaces where the model controls both conversation flow and UI state.