Overview

Most modular solution for technical teams. Send audio from your voice agent, receive avatar video streams with 250ms response times from audio input to HD avatar video output.

When to Use This

Choose speech-to-video when you need:

Component-Level Control: Complete management of turn detection, STT, LLM, and TTS components
Complex Tool Calling: Flexible LLM integrations with external APIs and databases
Voice Infrastructure Migration: Seamlessly upgrades existing voice agent infrastructure

For zero-infrastructure deployment, use managed agents instead.

Pipeline Overview

Your Voice Agent Pipeline

You manage media transport, turn detection, STT, LLM, and TTS components

Beyond Presence Speech-to-Video API

Receives audio input from your pipeline

Avatar Video Output

Beyond Presence manages avatar generation and video streaming

Supported Frameworks

We support integration with popular voice agent frameworks including LiveKit and Pipecat, allowing you to add avatar video to your existing voice pipelines.

LiveKit Plugin

Add avatars to your LiveKit agents with our plugin

Pipecat Integration

Add avatars to your Pipecat bots with our integration

Next Steps

Managed Agents

Zero-infrastructure deployment with built-in optimizations and automatic scaling

n8n Workflows

No-code workflow automation for managing calls with agents

Get Started

Integrations

Learn More

When to Use This

Pipeline Overview

Supported Frameworks

LiveKit Plugin

Pipecat Integration

Next Steps

Managed Agents

n8n Workflows

Get Started

Integrations

Learn More

​When to Use This

​Pipeline Overview

​Supported Frameworks

LiveKit Plugin

Pipecat Integration

​Next Steps

Managed Agents

n8n Workflows

When to Use This

Pipeline Overview

Supported Frameworks

Next Steps