Streaming (Websocket)
Connect to Websocket URL
Send TTS Request with JSON params
Receive audio data in JSON or binary format
Receive a final termination message
Synthesis Request
WebSocket URL
wss://websocket.cluster.0xIQ.ai/stream
Request Params
Send a JSON message to request speech synthesis. The message format is as follows:
{
"voice_uuid": "<voice_uuid>",
"project_uuid": "<project_uuid>",
"data": "<text | ssml>",
"binary_response": false, // Optional, defaults to false for JSON response
"request_id": 0, // Optional, auto-incremented if not specified
// Additional optional parameters as needed
}
voice_uuid
string
Yes
The voice to synthesize the text in.
project_uuid
string
Yes
The project to save the data to.
data
string
Yes
The text or SSML to synthesize. Maximum length of 3000
characters (not including SSML).
request_id
int
No
Optional numerical identifier for the request. Returned with each response with increasing integers starting with 0
binary_response
bool
No
Defaults to false
. If true
, returns audio data in binary format (MP3 or WAV), suitable for direct playback. If false
, returns audio data in JSON frames with base64 encoding.
output_format
string
No
The output format of the produced audio. Either "wav"
, or "mp3"
.
sample_rate
integer
No
The sample rate of the produced audio. Either 8000
, 16000
, 22050
, 32000
, or 44100
precision
string
No
The bit depth of the generated audio. One of the following values: PCM_32, PCM_24, PCM_16, or MULAW. Default is PCM_32.
no_audio_header
bool
No
Defaults to false
. If true
, the audio header will not be included in the binary WAV file as response. If false
, the audio header will be included.
Audio Output
JSON Response Format
When binary_response
is set to false
, the server sends multiple audio chunks as JSON objects:
{
"type": "audio",
"audio_content": <base64_encoded_audio>,
"audio_timestamps": {
"graph_chars": ["H", "e"] OR null,
"graph_times": [[0.0374, 0.1247], [0.0873, 0.1746]] OR null,
"phon_chars": ["h","ˈe",] OR null,
"phon_times": [[0.0374, 0.1247], [0.0873, 0.1746]] OR null,
},
"sample_rate": 32000,
"request_id": 0
}
Binary Response Format
When binary_response
is set to true
, audio chunks are sent as contiguous bytes of a WAV or MP3 file:
// Binary data stream
Termination Message
Indicates the completion of the audio stream:
{
"type": "audio_end",
"request_id": 0
}
Error Handling
note
Websockets API is only available for Business plan users. If you're running into trouble, upgrade to a Business plan or higher on the billing page.
Unrecoverable Errors
Errors occurring during the connection handshake, leading to connection failure:
{
"type": "error",
"success": false,
"error_name": "ConnectionFailure",
"message": "Failed to establish a connection.",
"status_code": 401 // Example status code
}
Recoverable Errors
Errors related to synthesis requests that do not interrupt the ongoing connection:
{
"type": "error",
"success": false,
"error_name": "BadJSON",
"error_params": {"explanation": "Provide your query to synthesize as text or SSML in the 'data' field"},
"message": "Invalid JSON: Provide your query to synthesize as text or SSML in the 'data' field",
"status_code": 400
}
Last updated