← Back to The 11 Things That Will Break Your AI in Production

2026-04-13·Ryan Bolden·Part of: The 11 Things That Will Break Your AI in Production

The uncomfortable math on AI voice costs

Voice AI pricing looks simple. The vendor charges per second or per minute. You estimate your call volume, multiply, and get a number. The number looks reasonable. You build your pricing model around it. You launch.

Then real calls start coming in. And the math breaks.

Here is what the math actually looks like in production. Our voice agent handles calls for a psychiatric practice. Average call duration: 1.9 minutes. At current ElevenLabs per-second rates, a single call costs roughly $0.03 to $0.04 for the voice synthesis alone. That does not include the LLM inference cost for understanding the patient, the tool-calling cost for checking the calendar and booking the appointment, or the infrastructure cost for routing, recording, and logging the call.

At 1,700 calls per month — which is what our system handles — the voice cost alone is $50-$70 per month. Add LLM inference, and you are at $100-$150. Add infrastructure, and you are at $200-$300. That sounds manageable for a platform charging $799 per month.

Now add the cases the spreadsheet does not model.

Failed calls that retry. When a call drops or the voice agent encounters an error, the system retries. Each retry is another billable call. In production, 5-8% of calls involve at least one retry. That is 85-136 additional billable events per month that do not appear in your initial estimate.

Bilingual conversations. Spanish-language calls run 15-25% longer than English-language calls on average. Partially because medical terminology in Spanish requires more words, partially because bilingual patients often switch between languages mid-conversation, and partially because the voice synthesis model processes each language switch as a new segment. If 30% of your call volume is Spanish — which is realistic for a Las Vegas practice — your average cost per call increases meaningfully.

Long holds and transfers. When a patient needs to be transferred to a human — an emergency, a complex billing question, a situation the AI cannot handle — the voice agent stays on the line during the transfer. That hold time is billable. A 30-second transfer with a 2-minute hold before the human picks up costs as much as the original call.

After-hours spikes. 27-35% of patient calls come outside business hours. These calls tend to be longer because the patient is not in a rush and the AI is the only available resource. After-hours calls average 2.4 minutes compared to 1.6 minutes during business hours. The cost per call is 50% higher during the hours when call volume is lowest — which is the opposite of what most pricing models assume.

We ran the actual numbers. Not projections. Not vendor estimates. Actual invoices across thousands of real patient calls over three months. The voice cost per call in production is 40-60% higher than what the spreadsheet predicted before launch.

The unit economics still work. But only because we architected for them from day one. We designed our prompt structure to minimize call duration without sacrificing quality. We built retry logic that caps at two attempts to avoid runaway costs. We optimized our tool-calling patterns so the AI resolves patient requests in fewer turns, which directly reduces billable seconds.

Most teams discover the cost problem after they have already launched with a pricing model that cannot absorb the real numbers. By then, they are either losing money on every call or raising prices on customers who signed up at the original rate. Neither option is good.

If you are building a voice AI product and your cost model is based on vendor pricing sheets and estimated call volumes, you are building on projections. We have the production data. The real numbers look different from the projections in ways that matter. The good news: the economics work. The bad news: they only work if you architect for them before you launch, not after.

This is one piece of a larger framework we built and operate in production. The full picture — and how it applies to your business — is in the playbook.

We specialize in healthcare because it is the hardest vertical — strict HIPAA regulation, PHI handling, BAA chains, and zero tolerance for failure. If we can build it for healthcare, we can build it for any industry. We work across verticals.

See the Playbook →Talk to Ryan

← When your security is a checkbox, not an architecture

Written by Ryan Bolden · Founder, Riscent · ryan@riscent.com