POST
/
v1
/
speak
curl -X POST "https://api.bland.ai/v1/speak" \
  -H "authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voice": "Maeve",
    "text": "Hello, this is a test of the text-to-speech system.",
    "output_format": "pcm_44100"
  }'
{
  "status": "error",
  "errors": [
    {
      "error": "Invalid Input",
      "message": "Missing text in request body - must be a string"
    }
  ]
}

This endpoint only supports Bland’s Beige Voices.

Pricing

Text-to-speech pricing varies by plan:

  • Start Plan: $0.02 per 100 characters (Default)
  • Build Plan: $0.02 per 150 characters
  • Scale Plan: $0.02 per 200 characters
  • Enterprise Plan: $0.02 per 400 characters

Headers

authorization
string
required

Your API key for authentication.


Body Parameters

voice
string
required

The ID or name of the Beige Voice to use for speech synthesis.

text
string
required

The text content to be converted to speech.

output_format
string
default:"pcm_44100"

The audio output format for the generated speech.

Available formats:

  • pcm_8000 - PCM 8kHz, 16-bit, mono
  • pcm_16000 - PCM 16kHz, 16-bit, mono
  • pcm_24000 - PCM 24kHz, 16-bit, mono
  • pcm_44100 - PCM 44.1kHz, 16-bit, mono (default)
  • ulaw_8000 - μ-law 8kHz, 8-bit, mono
consistency
float

Controls voice consistency and stability. Value between 0 and 1.

  • 0 - More varied, expressive speech
  • 1 - More consistent, stable speech
expressiveness
float

Controls voice expressiveness and emotion. Value between 0 and 1.

  • 0 - More monotone, neutral speech
  • 1 - More expressive, emotional speech
stream
boolean
default:"false"

Enable streaming audio response for real-time playback.

When true, audio is streamed as it’s generated. When false, the complete audio file is returned after generation.


Response

Non-Streaming Response

audio
audio/x-wav

Complete WAV audio file with the specified output format containing the synthesized speech.

Streaming Response

audio_stream
audio/x-wav

Chunked WAV audio stream. The response starts with a WAV header followed by audio data chunks as they’re generated.

Response Headers

x-latency
string

Time in milliseconds from request to first audio chunk/complete response.

x-cost
string

Cost in USD for the text-to-speech conversion

Content-Type
string

Always audio/x-wav for successful responses.

Content-Length
string

Size of the complete audio file in bytes (non-streaming only).

Transfer-Encoding
string

Set to chunked for streaming responses.


curl -X POST "https://api.bland.ai/v1/speak" \
  -H "authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voice": "Maeve",
    "text": "Hello, this is a test of the text-to-speech system.",
    "output_format": "pcm_44100"
  }'
curl -X POST "https://api.bland.ai/v1/speak" \
  -H "authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voice": "Maeve",
    "text": "This is a streaming example that will return audio chunks as they are generated.",
    "output_format": "pcm_24000",
    "stream": true
  }' \
  --output audio_stream.wav
curl -X POST "https://api.bland.ai/v1/speak" \
  -H "authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voice": "Maeve",
    "text": "This example uses custom voice parameters for more expressive speech.",
    "output_format": "pcm_16000",
    "consistency": 0.3,
    "expressiveness": 0.8
  }'
{
  "status": "error",
  "errors": [
    {
      "error": "Invalid Input",
      "message": "Missing text in request body - must be a string"
    }
  ]
}