> ## Documentation Index > Fetch the complete documentation index at: https://docs.bland.ai/llms.txt > Use this file to discover all available pages before exploring further. # Speak > Generate speech audio from text using a Bland TTS voice. ## Overview Synthesizes text into a WAV audio file using a Bland TTS voice. Set `stream: true` to receive the audio as a chunked response while it's being generated. Requires a Bland TTS voice (`BTTS`, `BTTS_V2`, or `BTTS_V3`). Every generation is automatically stored and retrievable via [List TTS Generations](/api-v1/get/speak-samples) and [Get TTS Generation](/api-v1/get/speak-samples-id). For lower-latency streaming with chunked WAV-header backfill, use [Stream Speech](/api-v1/post/speak-stream) instead. ## Pricing Text-to-speech pricing scales with plan: | Plan | Rate | | ---------- | ------------------------- | | Start | \$0.02 per 100 characters | | Build | \$0.02 per 150 characters | | Scale | \$0.02 per 200 characters | | Enterprise | \$0.02 per 400 characters | Some voices in the public library carry an additional per-character creator fee. The exact cost for a generation is returned in the `x-cost` response header. *** ## Headers Your API key for authentication. *** ## Body Parameters The text to synthesize. Maximum 5,000 characters per request. Supports pause markers in the form `<|N|>` where N is a duration between 0.1 and 10.0 seconds, for example: `"Welcome to Bland. <|0.8|> Let's get started."` ID of the Bland TTS voice to use. Pass either the voice UUID from [List Voices](/api-v1/get/voices) or a curated voice name (for example `willow`, `juniper`, `valentine_experimental`). Audio container/sample rate of the response. * `pcm_8000`, 8 kHz PCM16 mono (telephony) * `pcm_16000`, 16 kHz PCM16 mono * `pcm_24000`, 24 kHz PCM16 mono * `pcm_44100`, 44.1 kHz PCM16 mono (default, studio) * `ulaw_8000`, 8 kHz u-law mono (telephony) When `true`, the response is sent with `Transfer-Encoding: chunked` as audio is generated. The first 44 bytes are a WAV header with placeholder sizes (`0xFFFFFFFF`). Subsequent chunks are PCM16 audio data. The client backfills bytes 4-7 (RIFF chunk size minus 8) and 40-43 (data chunk size) after the stream closes. Language code for synthesis. Defaults to the voice's primary language. Available languages depend on the voice's underlying model (V2 and V3 voices support 17+ languages). Voice consistency control. * For `BTTS` (V1) voices, a float between 0.0 and 1.0. Higher values produce more consistent output. * For `BTTS_V2` and `BTTS_V3` voices, an integer between 1 and 32 (`per_decode`). **Lower** values produce more consistent output. Expressiveness control for `BTTS` (V1) voices only. Float between 0.0 and 1.0. Higher values produce more expressive speech. Expressiveness boost flag for `BTTS_V2` and `BTTS_V3` voices. `0` (off) or `1` (on). *** ## Response Returns a binary WAV file with `Content-Type: audio/x-wav`. Inspect response headers for latency and cost. Time in milliseconds from request to first audio byte (streaming) or full response (non-streaming). Cost in USD for the synthesis, matching what was billed. Always `audio/x-wav` on success. Total bytes in the audio (non-streaming only). `chunked` when `stream: true`. ```http Response theme={null} HTTP/1.1 200 OK Content-Type: audio/x-wav Content-Length: 98806 x-latency: 396 x-cost: 0.001 ``` ```json Service Not Supported theme={null} { "data": null, "errors": [ { "error": "Service Not Supported", "message": "This endpoint only supports Bland's Beige Voices" } ] } ``` ```json Invalid Input theme={null} { "data": null, "errors": [ { "error": "Invalid Input", "message": "Missing voice parameter" } ] } ``` ```json Rate Limited theme={null} { "data": null, "errors": [ { "error": "Rate Limit Exceeded", "message": "Too many requests from this user, please try again later." } ] } ``` *** Docs for agents: [llms.txt](/llms.txt)