Live Translation
Create Translation Session
Create a real-time translation session and receive a WebSocket URL for streaming audio
POST
Creates a translation session and returns a single-use WebSocket URL. Stream audio in your source language and receive translated speech back in real time — along with transcript events for every translated utterance.
See the Live Translation API tutorial for the full streaming protocol, audio formats, and integration walkthrough.
Audio Format:
No framing or headers — raw samples only. 20ms chunks (640 bytes at 16kHz) are typical.
Audio Format:
All frames are JSON text in Twilio Media Streams shape — base64 8kHz μ-law audio in
Outbound translated audio arrives as
Authentication
Your API key for authentication. The key must belong to an organization — user-scoped keys without an organization are rejected with
403 TAAS_ORG_REQUIRED.Body Parameters
Language the inbound audio is spoken in. Supported codes:
en, es, fr, de, it, pt, nl, pl, sv, fi, da, cs, el, ro, ru, tr, ar, hi, id, tl, zh, ja, koLanguage to translate into. Same supported codes as
source_language.UUID of a Bland voice to use for the translated speech. Must belong to your organization. Defaults to a standard voice for the target language.
Wire format for audio on the WebSocket. One of:
pcm16— raw PCM-16 little-endian binary framestwilio_ulaw— Twilio Media Streams JSON envelopes carrying 8kHz μ-law audio, for direct interop with Twilio<Stream>
Sample rate of the audio you will send, in Hz.
pcm16 mode only — integer between 8000 and 48000. 16000 is recommended. Do not set this for twilio_ulaw (Twilio audio is always 8kHz).Maximum session length in seconds, between
30 and 1800. The session ends automatically when this limit is reached.Response
Array of error objects if the request failed
Limits & Billing
- Sessions are billed per minute of connected time, rounded up. The per-minute rate depends on your plan.
- Up to 3 concurrent sessions per organization (pending sessions count until they expire). Contact us to raise this limit.
- A daily translation-minutes rate limit applies per organization.
WebSocket Connection
After creating a session, connect to the returnedws_url within 10 minutes:
- URL: Use
ws_urlexactly as returned — it is opaque and single-use. Do not parse or reconstruct it. - Protocol: WebSocket (WSS)
- Authentication: Embedded in the URL — no headers required
- Frames: Binary frames carry audio; text frames carry JSON control events
Audio Format: pcm16
| Direction | Format |
|---|---|
| Client → Bland | Binary frames of raw PCM-16 little-endian mono audio at your declared sample_rate |
| Bland → Client | Binary frames of raw PCM-16 little-endian mono audio at 16,000 Hz |
Audio Format: twilio_ulaw
All frames are JSON text in Twilio Media Streams shape — base64 8kHz μ-law audio in media.payload. Send a start event first so outbound envelopes echo your streamSid:
{"event":"media","streamSid":"<your sid>","media":{"payload":"<base64 8kHz μ-law>"}}.
Control Events
JSON text frames from the server, discriminated bytype:
| Event | When | Key fields |
|---|---|---|
ready | Pipeline warmed up; transcripts begin flowing after this | session_id |
transcript | One finalized utterance was translated | turn_id, original, translation, source_language, target_language, is_final |
tts_complete | All translated audio for a turn has been written to the WebSocket | turn_id, audio_duration_ms |
session_ended | Terminal notification — last JSON frame before close | reason, session_seconds, end_reason |
error | Fatal session failure, followed by session_ended and close | code, message, fatal |
Example transcript event
audio_duration_msis currently always0— a placeholder. Do not rely on it.- A
tts_completemay be absent for a turn whose speech synthesis failed — don’t block on it. The transcript still arrives. - Error codes that can fire:
taas/audio_protocol_mismatchandtaas/internal_error, bothfatal: true. session_ended.reasonis one ofclient_disconnect,max_duration,error,api_terminated.
Close Codes
| Code | Meaning |
|---|---|
1000 | Normal close after session_ended |
1011 | Internal server error |
4001 | Invalid, expired, or already-used session token |
4002 | Session not found |
4003 | Session already ended |
4004 | First frame didn’t match the declared audio_protocol |
4005 | max_duration_seconds reached |
Notes
- Translated audio can arrive faster than real time — buffer client-side and play at the natural rate
- Reconnection is not supported; if the WebSocket drops, create a new session
- The WebSocket closes automatically at
max_duration_seconds