Voice AI Glossary
76 essential voice AI terms, defined in plain English. Whether you're building or selling voice AI, this is your reference for the vocabulary that powers the industry.
A
AES-256 Encryption
Compliance & SecurityAES-256 is a strong encryption standard used to secure data both at rest and in transit. It is a common baseline for protecting sensitive call and customer data.
AI Receptionist
Core AI & VoiceAn AI receptionist is a voice agent deployed to answer inbound calls for a business, greet callers, route them, answer common questions, and book appointments. It replaces or augments a human front-desk role.
AI Voice Agent
Core AI & VoiceAn AI voice agent is a software system that holds a spoken conversation with a caller, understanding speech and responding with synthesized voice in real time. It combines speech recognition, a language model, and text-to-speech to handle tasks like answering questions, qualifying leads, and booking appointments without a human on the line.
API
Business & OperationsAn Application Programming Interface (API) is a set of rules that lets different software systems communicate. Voice AI platforms expose APIs so agencies can integrate calling, leads, and data with CRMs and other tools.
Audio Processing
Speech TechnologyAudio processing encompasses the techniques used to capture, clean, and analyze sound, including noise suppression, echo cancellation, and gain control, all of which improve voice AI quality.
Automatic Speech Recognition
Speech TechnologyAutomatic speech recognition (ASR) is the technology, powered by acoustic and language models, that converts spoken audio into text. It is the technical foundation beneath speech-to-text.
B
Barge-In
Speech TechnologyBarge-in is the ability of a voice agent to detect and respond when a caller interrupts while the agent is still speaking, pausing its own speech to yield the floor, just as humans do.
Bring Your Own Key
Business & OperationsBring Your Own Key (BYOK) is a model where an agency connects its own provider API keys (for example, Vapi or Retell) to a white-label platform, paying the provider directly for usage while the platform adds the management layer.
Business Associate Agreement
Compliance & SecurityA Business Associate Agreement (BAA) is a contract required under HIPAA between a covered entity and a vendor that handles protected health information, defining how that data is safeguarded.
C
Call Analytics
Business & OperationsCall analytics is the measurement and reporting of call performance, including volume, duration, outcomes, sentiment, and transcripts, giving agencies and clients insight into results.
Call Routing
TelephonyCall routing is the logic that directs incoming calls to the right destination, such as a specific agent, department, or human, based on intent, time, or business rules.
Churn
Business & OperationsChurn is the rate at which clients cancel a subscription over time. Lowering churn is critical to growing and preserving recurring revenue in a voice AI agency.
Connect Rate
TelephonyConnect rate is the percentage of outbound call attempts that successfully reach a live person. It is a key metric for outbound campaign effectiveness.
Context
Conversational DesignContext is the information a voice agent remembers across a conversation, including prior statements and situational details, allowing coherent multi-turn dialogue.
Conversational AI
Core AI & VoiceConversational AI is the broad field of technology that enables machines to understand and participate in human-like dialogue, across voice and text. It spans speech recognition, natural language understanding, dialogue management, and response generation.
CRM Integration
Business & OperationsCRM integration is the connection between a voice AI platform and a customer relationship management system, so call data, leads, and outcomes flow into the records a business already uses.
Customer Retention
Business & OperationsCustomer retention is the ability to keep clients over time. In voice AI, retention is driven by measurable results, reliable service, and ongoing value beyond the initial setup.
D
Data Privacy
Compliance & SecurityData privacy is the protection of personal information collected during interactions, including call recordings and transcripts, in line with regulations like GDPR and CCPA.
Deepgram
Speech TechnologyDeepgram is a leading automatic speech recognition provider known for fast, accurate, real-time speech-to-text. It is commonly used as the STT engine within voice AI stacks.
Dialog Management
Conversational DesignDialog management is the control of a conversation's flow and state, deciding what the agent says next based on context, intent, and business rules.
DTMF
TelephonyDual-Tone Multi-Frequency (DTMF) is the technology behind touch-tone keypad input, where pressing a number generates a specific tone. Voice systems can use DTMF for hybrid voice-and-keypad interactions.
E
ElevenLabs
Speech TechnologyElevenLabs is a widely used AI voice provider known for highly realistic text-to-speech and voice cloning. It supplies premium synthetic voices for conversational AI agents.
Endpointing
Speech TechnologyEndpointing is the system's ability to detect when a caller has finished speaking so it can begin responding. Accurate endpointing avoids premature replies and awkward silences.
Entity Extraction
Conversational DesignEntity extraction is the process of identifying and pulling structured values, like a date or a budget figure, out of a caller's free-form speech for use in the conversation or a CRM.
F
Fallback Strategies
Conversational DesignFallback strategies are predefined responses the agent uses when it cannot understand or handle a request, preventing dead ends and keeping the conversation productive.
Full-Duplex Communication
Speech TechnologyFull-duplex communication means audio can be sent and received simultaneously, so both parties can speak and be heard at the same time. This enables natural interruptions during a call.
Function Calling
Conversational DesignFunction calling (or tool calling) is the ability of a language model to trigger external actions or APIs during a conversation, such as checking calendar availability, creating a booking, or looking up an account.
G
GDPR
Compliance & SecurityThe General Data Protection Regulation (GDPR) is the European Union's data privacy law governing how personal data is collected, stored, and processed, with strict consent and rights requirements.
GoHighLevel
Business & OperationsGoHighLevel (GHL) is a popular all-in-one marketing and CRM platform used widely by agencies. Voice AI platforms integrate with GoHighLevel to push leads, calls, and outcomes into client sub-accounts.
Guardrails
Conversational DesignGuardrails are rules and constraints that keep a voice agent's responses safe, on-brand, and within scope, preventing off-topic, inappropriate, or hallucinated answers.
H
Hallucination
Conversational DesignA hallucination is when a language model generates information that sounds plausible but is incorrect or fabricated. Guardrails, knowledge bases, and RAG help reduce hallucinations in voice agents.
HIPAA
Compliance & SecurityThe Health Insurance Portability and Accountability Act (HIPAA) is a US law governing the protection of protected health information. Voice AI used in healthcare may require compliant infrastructure and a signed agreement.
Hotword Detection
Speech TechnologyHotword detection, similar to wake word detection, is the recognition of a specific phrase that signals the system to start listening or take an action.
Human Handoff
Conversational DesignHuman handoff is the seamless transfer of a call from an AI agent to a live person when the request is too complex, sensitive, or requires human judgment.
I
Inbound Calls
TelephonyInbound calls are calls that come into a business from customers or prospects. Voice AI agents handle inbound calls to answer questions, capture leads, and book appointments around the clock.
Intent Recognition
Conversational DesignIntent recognition is the system's ability to identify the goal or purpose behind what a caller says, such as booking an appointment or asking about pricing, so it can respond appropriately.
Interactive Voice Response
TelephonyInteractive Voice Response (IVR) is the menu-based phone system that greets callers and routes them using voice or keypad inputs. Modern AI-driven IVR replaces rigid menus with natural conversation.
J
K
Keyword Spotting
Speech TechnologyKeyword spotting is the technique of identifying specific words or phrases within continuous speech, used for triggering actions, routing calls, or detecting topics.
Knowledge Base
Conversational DesignA knowledge base is a structured set of information a voice agent can reference during a call to answer questions accurately, such as business details, services, and FAQs.
L
Large Language Model
Core AI & VoiceA large language model (LLM) is an AI system trained on vast amounts of text that can understand and generate human language. In voice AI, the LLM interprets what a caller says and decides how the agent should respond.
Latency
Speech TechnologyLatency is the delay between a caller finishing a phrase and the voice agent beginning its response. Lower latency creates a more natural, real-time conversation, while high latency causes awkward pauses.
M
N
Named Entity Recognition
Conversational DesignNamed Entity Recognition (NER) is the extraction of specific data points from speech, such as names, dates, phone numbers, and amounts, so the agent can capture and act on structured information.
Natural Language Processing
Core AI & VoiceNatural language processing (NLP) is the branch of AI focused on interpreting and generating human language. It powers intent recognition, entity extraction, and the language understanding behind voice agents.
Noise Suppression
Speech TechnologyNoise suppression is audio processing that reduces background noise to improve speech clarity and recognition accuracy, important when callers are in noisy environments.
O
P
Prompt Engineering
Conversational DesignPrompt engineering is the practice of designing and refining the instructions given to a language model so it behaves correctly, stays on topic, and represents a brand's voice and rules.
PSTN
TelephonyThe Public Switched Telephone Network (PSTN) is the traditional, worldwide circuit-switched telephone network. Voice AI platforms bridge internet-based calling to the PSTN so agents can reach ordinary phones.
R
Real-Time Processing
Speech TechnologyReal-time processing is the handling of speech with minimal delay, so recognition, reasoning, and response happen fast enough to sustain a natural conversation.
Recurring Revenue
Business & OperationsRecurring revenue is income that repeats on a regular basis, such as monthly subscriptions. Voice AI agencies build recurring revenue by charging clients a monthly retainer or subscription.
Reseller
Business & OperationsA reseller is a partner who sells another company's product or service to end clients, often under their own brand and at their own price. Voice AI resellers keep the margin between their cost and what they charge clients.
Retell AI
Business & OperationsRetell AI is a conversational voice AI platform used to build voice agents. Alongside Vapi and ElevenLabs, it is one of the providers supported by Fusion Calling.
Retrieval-Augmented Generation
Conversational DesignRetrieval-Augmented Generation (RAG) is a technique where a language model pulls in relevant information from a knowledge base or documents before responding, improving accuracy and reducing hallucination.
ROI Tracking
Business & OperationsReturn on Investment (ROI) tracking measures the financial return of a voice AI deployment, such as leads captured, appointments booked, or revenue generated relative to cost.
S
SaaS
Business & OperationsSoftware as a Service (SaaS) is a software delivery model where customers pay a recurring subscription to access a cloud-hosted product. White-label voice AI is typically sold as SaaS by agencies to their clients.
Sentiment Analysis
Conversational DesignSentiment analysis is the detection of emotional tone in a caller's speech, allowing the system to gauge satisfaction or frustration and adjust or escalate accordingly.
SIP
TelephonySession Initiation Protocol (SIP) is a signaling protocol used to set up, manage, and tear down voice and video calls over IP networks. It is a backbone standard for VoIP telephony.
SOC 2
Compliance & SecuritySOC 2 is an auditing standard that evaluates a service organization's controls for security, availability, and confidentiality. It is commonly requested in enterprise and regulated procurement.
Speech-to-Text
Speech TechnologySpeech-to-text (STT), also called automatic speech recognition, is the technology that converts spoken words into written text. In a voice agent, STT transcribes what the caller says so the system can understand and act on it.
Synthetic Voice
Speech TechnologyA synthetic voice is an artificially generated voice produced by text-to-speech or voice cloning systems, designed to sound natural and human-like rather than robotic.
T
TCPA
Compliance & SecurityThe Telephone Consumer Protection Act (TCPA) is a US law restricting telemarketing calls, auto-dialed calls, and prerecorded messages, requiring consent and governing outbound calling practices.
Telephony
TelephonyTelephony refers to the technology and infrastructure for transmitting voice calls over telephone networks. Voice AI platforms connect to telephony providers to place and receive real phone calls.
Text-to-Speech
Speech TechnologyText-to-speech (TTS) is the technology that converts written text into spoken audio. In voice AI, TTS generates the agent's voice responses, and modern neural TTS produces highly natural, human-like speech.
Transcription
Speech TechnologyTranscription is the process of converting a call's spoken audio into written text. Voice AI platforms transcribe every call for records, search, quality review, and analytics.
Turn-Taking
Speech TechnologyTurn-taking is the natural back-and-forth rhythm of a conversation, where participants alternate between speaking and listening. Voice agents manage turn-taking to keep exchanges smooth and human-like.
Twilio
TelephonyTwilio is a cloud communications platform that provides programmable voice, SMS, and telephony APIs. Many voice AI stacks use Twilio or similar carriers for phone numbers and call connectivity.
V
Vapi
Business & OperationsVapi is a popular voice AI platform and API for building and deploying voice agents. It is one of the core providers supported by Fusion Calling and many white-label wrappers in the market.
Voice Bot
Core AI & VoiceA voice bot is an automated system that interacts with users through spoken language, typically over the phone or through a device. The term is often used interchangeably with voice agent, though voice bot can imply simpler, rule-based systems.
Voice Cloning
Speech TechnologyVoice cloning is the technology that creates a synthetic voice modeled on a specific person's speech. Providers like ElevenLabs can generate custom voices for branded, consistent agent personas.
Voice User Interface
Conversational DesignA Voice User Interface (VUI) is the design of how people interact with a system through voice, covering prompts, responses, flow, and error handling. Good VUI design makes agents feel natural and easy to use.
VoIP
TelephonyVoice over Internet Protocol (VoIP) is the technology that delivers voice calls over the internet rather than traditional phone lines, enabling flexible, software-driven calling.
W
Wake Word Detection
Speech TechnologyWake word detection is the ability of a voice system to recognize a specific trigger word or phrase that activates it, commonly used in smart speakers and assistants.
Webhook
Business & OperationsA webhook is an automated message sent from one system to another when an event occurs, such as a call ending. Voice AI platforms use webhooks to push call outcomes to CRMs and automation tools.
WebRTC
TelephonyWeb Real-Time Communication (WebRTC) is a technology that enables real-time voice and video directly in web browsers, commonly used for in-browser voice widgets and click-to-call experiences.
White-Label
Business & OperationsWhite-labeling is the practice of rebranding a product or service so an agency can sell it under its own brand. In voice AI, it lets agencies offer AI calling with their logo, domain, and pricing, while the underlying platform stays hidden.
Browse by Category
6 categories covering the full voice AI stack.
Conversational Design
16 terms
Compliance & Security
7 terms
Ready to Put These Terms Into Practice?
Fusion Calling gives agencies a branded voice AI platform with multi-provider support, guided onboarding, and full brand ownership.
Explore the Partner Program→