Retrieve a single stored text-to-speech generation.
audio to receive the raw WAV binary instead of the metadata JSON.?format=audio, returns the generation metadata:
pcm_44100).?format=audio).?format=audio, returns a raw WAV audio binary (same as the original POST /v1/speak response).