Streaming (Websocket)

Connect to Websocket URL
Send TTS Request with JSON params
Receive audio data in JSON or binary format
Receive a final termination message

Synthesis Request

WebSocket URL

wss://websocket.cluster.0xIQ.ai/stream

Request Params

Send a JSON message to request speech synthesis. The message format is as follows:

{
  "voice_uuid": "<voice_uuid>",
  "project_uuid": "<project_uuid>",
  "data": "<text | ssml>",
  "binary_response": false,  // Optional, defaults to false for JSON response
  "request_id": 0,           // Optional, auto-incremented if not specified
  // Additional optional parameters as needed
}

Attribute

Type

Required

Description

voice_uuid

string

Yes

The voice to synthesize the text in.

project_uuid

string

Yes

The project to save the data to.

data

string

Yes

The text or SSML to synthesize. Maximum length of 3000 characters (not including SSML).

request_id

int

Optional numerical identifier for the request. Returned with each response with increasing integers starting with 0

binary_response

bool

Defaults to false. If true, returns audio data in binary format (MP3 or WAV), suitable for direct playback. If false, returns audio data in JSON frames with base64 encoding.

output_format

string

The output format of the produced audio. Either "wav", or "mp3".

sample_rate

integer

The sample rate of the produced audio. Either 8000, 16000, 22050, 32000, or 44100

precision

string

The bit depth of the generated audio. One of the following values: PCM_32, PCM_24, PCM_16, or MULAW. Default is PCM_32.

no_audio_header

bool

Defaults to false. If true, the audio header will not be included in the binary WAV file as response. If false, the audio header will be included.

Audio Output

JSON Response Format

When binary_response is set to false, the server sends multiple audio chunks as JSON objects:

{
    "type": "audio",
    "audio_content": <base64_encoded_audio>,
    "audio_timestamps": {
        "graph_chars": ["H", "e"] OR null,
        "graph_times": [[0.0374, 0.1247], [0.0873, 0.1746]] OR null,
        "phon_chars": ["h","ˈe",] OR null,
        "phon_times": [[0.0374, 0.1247], [0.0873, 0.1746]] OR null,
    },
    "sample_rate": 32000,
    "request_id": 0
}

Binary Response Format

When binary_response is set to true, audio chunks are sent as contiguous bytes of a WAV or MP3 file:

// Binary data stream

Termination Message

Indicates the completion of the audio stream:

{
  "type": "audio_end",
  "request_id": 0
}

Error Handling

note

Websockets API is only available for Business plan users. If you're running into trouble, upgrade to a Business plan or higher on the billing page.

Unrecoverable Errors

Errors occurring during the connection handshake, leading to connection failure:

{
  "type": "error",
  "success": false,
  "error_name": "ConnectionFailure",
  "message": "Failed to establish a connection.",
  "status_code": 401  // Example status code
}

Recoverable Errors

Errors related to synthesis requests that do not interrupt the ongoing connection:

{
  "type": "error",
  "success": false,
  "error_name": "BadJSON",
  "error_params": {"explanation": "Provide your query to synthesize as text or SSML in the 'data' field"},
  "message": "Invalid JSON: Provide your query to synthesize as text or SSML in the 'data' field",
  "status_code": 400
}

PreviousStreaming (HTTP)NextSpeech to Speech

Last updated 1 year ago

hashtagSynthesis Request​arrow-up-right

hashtagWebSocket URL​arrow-up-right

hashtagRequest Params​arrow-up-right

hashtagAudio Output​arrow-up-right

hashtagJSON Response Format​arrow-up-right

hashtagBinary Response Format​arrow-up-right

hashtagTermination Message​arrow-up-right

hashtagError Handling

hashtagUnrecoverable Errors​arrow-up-right

hashtagRecoverable Errors​arrow-up-right

Synthesis Request

WebSocket URL

Request Params

Audio Output

JSON Response Format

Binary Response Format

Termination Message

Error Handling

Unrecoverable Errors

Recoverable Errors