> ## Documentation Index
> Fetch the complete documentation index at: https://docs.bland.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Speak

> Generate speech audio from text using a Bland TTS voice.

## Overview

Synthesizes text into a WAV audio file using a Bland TTS voice. Set `stream: true` to receive the audio as a chunked response while it's being generated.

<Note>
  Requires a Bland TTS voice (`BTTS`, `BTTS_V2`, or `BTTS_V3`).
</Note>

<Note>
  Every generation is automatically stored and retrievable via [List TTS Generations](/api-v1/get/speak-samples) and [Get TTS Generation](/api-v1/get/speak-samples-id).
</Note>

For lower-latency streaming with chunked WAV-header backfill, use [Stream Speech](/api-v1/post/speak-stream) instead.

## Pricing

Text-to-speech pricing scales with plan:

| Plan       | Rate                      |
| ---------- | ------------------------- |
| Start      | \$0.02 per 100 characters |
| Build      | \$0.02 per 150 characters |
| Scale      | \$0.02 per 200 characters |
| Enterprise | \$0.02 per 400 characters |

Some voices in the public library carry an additional per-character creator fee. The exact cost for a generation is returned in the `x-cost` response header.

***

## Headers

<ParamField header="authorization" type="string" required>
  Your API key for authentication.
</ParamField>

***

## Body Parameters

<ParamField body="text" type="string" required>
  The text to synthesize. Maximum 5,000 characters per request.

  Supports pause markers in the form `<|N|>` where N is a duration between 0.1 and 10.0 seconds, for example: `"Welcome to Bland. <|0.8|> Let's get started."`
</ParamField>

<ParamField body="voice_id" type="string" required>
  ID of the Bland TTS voice to use. Pass either the voice UUID from [List Voices](/api-v1/get/voices) or a curated voice name (for example `willow`, `juniper`, `valentine_experimental`).
</ParamField>

<ParamField body="output_format" type="string" default="pcm_44100">
  Audio container/sample rate of the response.

  * `pcm_8000`, 8 kHz PCM16 mono (telephony)
  * `pcm_16000`, 16 kHz PCM16 mono
  * `pcm_24000`, 24 kHz PCM16 mono
  * `pcm_44100`, 44.1 kHz PCM16 mono (default, studio)
  * `ulaw_8000`, 8 kHz u-law mono (telephony)
</ParamField>

<ParamField body="stream" type="boolean" default="false">
  When `true`, the response is sent with `Transfer-Encoding: chunked` as audio is generated. The first 44 bytes are a WAV header with placeholder sizes (`0xFFFFFFFF`). Subsequent chunks are PCM16 audio data. The client backfills bytes 4-7 (RIFF chunk size minus 8) and 40-43 (data chunk size) after the stream closes.
</ParamField>

<ParamField body="language" type="string" default="en">
  Language code for synthesis. Defaults to the voice's primary language. Available languages depend on the voice's underlying model (V2 and V3 voices support 17+ languages).
</ParamField>

<ParamField body="consistency" type="number">
  Voice consistency control.

  * For `BTTS` (V1) voices, a float between 0.0 and 1.0. Higher values produce more consistent output.
  * For `BTTS_V2` and `BTTS_V3` voices, an integer between 1 and 32 (`per_decode`). **Lower** values produce more consistent output.
</ParamField>

<ParamField body="expressiveness" type="number">
  Expressiveness control for `BTTS` (V1) voices only. Float between 0.0 and 1.0. Higher values produce more expressive speech.
</ParamField>

<ParamField body="boost" type="integer">
  Expressiveness boost flag for `BTTS_V2` and `BTTS_V3` voices. `0` (off) or `1` (on).
</ParamField>

***

## Response

Returns a binary WAV file with `Content-Type: audio/x-wav`. Inspect response headers for latency and cost.

<ResponseField name="x-latency" type="string">
  Time in milliseconds from request to first audio byte (streaming) or full response (non-streaming).
</ResponseField>

<ResponseField name="x-cost" type="string">
  Cost in USD for the synthesis, matching what was billed.
</ResponseField>

<ResponseField name="Content-Type" type="string">
  Always `audio/x-wav` on success.
</ResponseField>

<ResponseField name="Content-Length" type="string">
  Total bytes in the audio (non-streaming only).
</ResponseField>

<ResponseField name="Transfer-Encoding" type="string">
  `chunked` when `stream: true`.
</ResponseField>

<ResponseExample>
  ```http Response theme={null}
  HTTP/1.1 200 OK
  Content-Type: audio/x-wav
  Content-Length: 98806
  x-latency: 396
  x-cost: 0.001

  <WAV binary>
  ```

  ```json Service Not Supported theme={null}
  {
    "data": null,
    "errors": [
      { "error": "Service Not Supported", "message": "This endpoint only supports Bland's Beige Voices" }
    ]
  }
  ```

  ```json Invalid Input theme={null}
  {
    "data": null,
    "errors": [
      { "error": "Invalid Input", "message": "Missing voice parameter" }
    ]
  }
  ```

  ```json Rate Limited theme={null}
  {
    "data": null,
    "errors": [
      { "error": "Rate Limit Exceeded", "message": "Too many requests from this user, please try again later." }
    ]
  }
  ```
</ResponseExample>

***

Docs for agents: [llms.txt](/llms.txt)
