Introduction
The Live Translation API gives you Bland’s real-time speech translation engine as a standalone service. Open a WebSocket, stream audio in the source language, and receive translated speech back.Real-Time Streaming
Speech-to-speech translation over a single WebSocket. Transcripts and translated audio arrive per utterance as the speaker talks.
Two Audio Protocols
Raw PCM-16 for web and custom integrations, or Twilio Media Streams envelopes for drop-in interop with Twilio
<Stream>.23 Languages
Translate between any pair of supported languages, with configurable Bland voices for the translated speech.
Session-Based Billing
Billed per minute of connected session time. Track duration and billed minutes through the session API.
How it works
- Create a session with
POST /v1/translation/sessions, choosing the language pair, audio protocol, and optionally a voice. - Connect to the returned
ws_urlwithin 10 minutes. The URL is opaque and single-use — connect to it exactly as returned. - Stream audio and consume events. Binary frames are audio; text frames are JSON control events (
ready,transcript,tts_complete,session_ended,error). - Close the WebSocket when done — or end the session from your backend with
DELETE, or let the session’s max duration end it.
GET /v1/translation/sessions/:id returns the final state, duration, and billed minutes.
Choosing an audio protocol
pcm16 (default) | twilio_ulaw | |
|---|---|---|
| Best for | Web apps, native clients, server-side audio | Piping a Twilio call into translation |
| Wire format | Raw PCM-16 binary frames | Twilio Media Streams JSON envelopes |
| Sample rate | You choose (8–48kHz in, 16kHz out) | Always 8kHz μ-law |
twilio_ulaw, the wire format matches Twilio’s <Stream> messages exactly, so you can forward Twilio’s WebSocket frames with minimal glue code.
Full wire formats, control-event schemas, and close codes are in the API reference.
Quickstart
Best practices
- Buffer translated audio client-side. It can arrive faster than real time — a 13-second utterance may be delivered in 7 seconds. Play at the natural rate and use
tts_completeto know when a turn’s audio is fully delivered. - Treat
ws_urlas a secret and as opaque. It embeds the session’s auth. Don’t log it, parse it, or reuse it — it authenticates exactly one connection. - Handle disconnects by creating a new session. Reconnection isn’t supported; a dropped WebSocket ends the session.
- Release sessions you don’t use. Pending sessions count toward your concurrency cap until they expire —
DELETEthem if you create one and don’t connect.
Limits and billing
Sessions are billed per minute of connected time, rounded up — a 19-second session bills as 1 minute. The per-minute rate depends on your plan. Each organization can run up to 3 concurrent sessions (contact us to raise this) and sessions cap at 30 minutes.API Reference
Create Session
POST /v1/translation/sessionsGet Session
GET /v1/translation/sessions/:idEnd Session
DELETE /v1/translation/sessions/:id