Skip to main content
POST
/
v1
/
translation
/
sessions
curl -X POST "https://api.bland.ai/v1/translation/sessions" \
  -H "Authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source_language": "en",
    "target_language": "es",
    "audio_protocol": "pcm16",
    "sample_rate": 16000
  }'
{
  "data": {
    "session_id": "9592342c-0ed2-4c5e-8ceb-16aa55c804a7",
    "ws_url": "wss://stream-v2.aws.dc8.bland.ai/ws/translate/eyJhbGciOiJBMjU2R0NNS1ciLCJlbmMiOiJBMjU2R0NN...",
    "token": "fe120676-1b18-4c87-b6fc-c984e719d3bc",
    "expires_at": "2026-06-04T18:15:07.885Z",
    "max_duration_seconds": 1800
  },
  "errors": null
}
Creates a translation session and returns a single-use WebSocket URL. Stream audio in your source language and receive translated speech back in real time — along with transcript events for every translated utterance. See the Live Translation API tutorial for the full streaming protocol, audio formats, and integration walkthrough.

Authentication

authorization
string
required
Your API key for authentication. The key must belong to an organization — user-scoped keys without an organization are rejected with 403 TAAS_ORG_REQUIRED.

Body Parameters

source_language
string
required
Language the inbound audio is spoken in. Supported codes: en, es, fr, de, it, pt, nl, pl, sv, fi, da, cs, el, ro, ru, tr, ar, hi, id, tl, zh, ja, ko
target_language
string
required
Language to translate into. Same supported codes as source_language.
voice_id
string
UUID of a Bland voice to use for the translated speech. Must belong to your organization. Defaults to a standard voice for the target language.
audio_protocol
string
default:"pcm16"
Wire format for audio on the WebSocket. One of:
  • pcm16 — raw PCM-16 little-endian binary frames
  • twilio_ulaw — Twilio Media Streams JSON envelopes carrying 8kHz μ-law audio, for direct interop with Twilio <Stream>
sample_rate
integer
default:"16000"
Sample rate of the audio you will send, in Hz. pcm16 mode only — integer between 8000 and 48000. 16000 is recommended. Do not set this for twilio_ulaw (Twilio audio is always 8kHz).
max_duration_seconds
integer
default:"1800"
Maximum session length in seconds, between 30 and 1800. The session ends automatically when this limit is reached.

Response

data
object
errors
array
default:"null"
Array of error objects if the request failed
The ws_url is single-use: it authenticates exactly one WebSocket connection. If your connection drops, create a new session — reconnecting with the same URL is rejected with close code 4001.

Limits & Billing

  • Sessions are billed per minute of connected time, rounded up. The per-minute rate depends on your plan.
  • Up to 3 concurrent sessions per organization (pending sessions count until they expire). Contact us to raise this limit.
  • A daily translation-minutes rate limit applies per organization.
curl -X POST "https://api.bland.ai/v1/translation/sessions" \
  -H "Authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source_language": "en",
    "target_language": "es",
    "audio_protocol": "pcm16",
    "sample_rate": 16000
  }'
{
  "data": {
    "session_id": "9592342c-0ed2-4c5e-8ceb-16aa55c804a7",
    "ws_url": "wss://stream-v2.aws.dc8.bland.ai/ws/translate/eyJhbGciOiJBMjU2R0NNS1ciLCJlbmMiOiJBMjU2R0NN...",
    "token": "fe120676-1b18-4c87-b6fc-c984e719d3bc",
    "expires_at": "2026-06-04T18:15:07.885Z",
    "max_duration_seconds": 1800
  },
  "errors": null
}

WebSocket Connection

After creating a session, connect to the returned ws_url within 10 minutes:
  • URL: Use ws_url exactly as returned — it is opaque and single-use. Do not parse or reconstruct it.
  • Protocol: WebSocket (WSS)
  • Authentication: Embedded in the URL — no headers required
  • Frames: Binary frames carry audio; text frames carry JSON control events
You can start sending audio immediately on connection open — frames are buffered server-side until the pipeline is ready.

Audio Format: pcm16

DirectionFormat
Client → BlandBinary frames of raw PCM-16 little-endian mono audio at your declared sample_rate
Bland → ClientBinary frames of raw PCM-16 little-endian mono audio at 16,000 Hz
No framing or headers — raw samples only. 20ms chunks (640 bytes at 16kHz) are typical.

Audio Format: twilio_ulaw

All frames are JSON text in Twilio Media Streams shape — base64 8kHz μ-law audio in media.payload. Send a start event first so outbound envelopes echo your streamSid:
{ "event": "start", "start": { "streamSid": "MZ..." } }
{ "event": "media", "media": { "payload": "<base64 μ-law>" } }
{ "event": "stop" }
Outbound translated audio arrives as {"event":"media","streamSid":"<your sid>","media":{"payload":"<base64 8kHz μ-law>"}}.

Control Events

JSON text frames from the server, discriminated by type:
EventWhenKey fields
readyPipeline warmed up; transcripts begin flowing after thissession_id
transcriptOne finalized utterance was translatedturn_id, original, translation, source_language, target_language, is_final
tts_completeAll translated audio for a turn has been written to the WebSocketturn_id, audio_duration_ms
session_endedTerminal notification — last JSON frame before closereason, session_seconds, end_reason
errorFatal session failure, followed by session_ended and closecode, message, fatal
Example transcript event
{
  "type": "transcript",
  "turn_id": "5862a126-2e88-46b1-bb95-0ed5c57155c1",
  "original": "Hello. This is a translation test.",
  "translation": "Hola. Esta es una prueba de traducción.",
  "source_language": "en",
  "target_language": "es",
  "is_final": true
}
  • audio_duration_ms is currently always 0 — a placeholder. Do not rely on it.
  • A tts_complete may be absent for a turn whose speech synthesis failed — don’t block on it. The transcript still arrives.
  • Error codes that can fire: taas/audio_protocol_mismatch and taas/internal_error, both fatal: true.
  • session_ended.reason is one of client_disconnect, max_duration, error, api_terminated.

Close Codes

CodeMeaning
1000Normal close after session_ended
1011Internal server error
4001Invalid, expired, or already-used session token
4002Session not found
4003Session already ended
4004First frame didn’t match the declared audio_protocol
4005max_duration_seconds reached

Notes

  • Translated audio can arrive faster than real time — buffer client-side and play at the natural rate
  • Reconnection is not supported; if the WebSocket drops, create a new session
  • The WebSocket closes automatically at max_duration_seconds