Create a voice
Voice types
There are two types of voices you can create on the 0xIQ platform: Rapid Voice Clone and Professional Voice Clone.
Rapid Voice Clone
A Rapid Voice Clone is a quick and easy way to create a voice for your content. Using as little as 10 seconds of recordings, you can create a voice clone in under a minute.
Professional Voice Clone
A Professional Voice Clone provides a more accurate way of creating a voice. It requires at least 10 minutes of recordings and takes around 40 minutes to create. This allows for a more detailed and personalized voice, as the AI has more data to work with.
Voice Data
There are 2 ways to provide data for a voice:
Providing a URL to a dataset when creating the voice
Uploading individual recordings using the recording API
Option 1: Providing a URL to a dataset when creating the voice
Rapid Voice
Create a voice using the "Create a voice" endpoint and provide a URL to the dataset in the
dataset_url
attribute. The dataset must be a wav file of at least 10 seconds.After creating the voice, follow the Build a voice documentation to start training.
Professional Voice
Create a voice using the "Create a voice" endpoint and provide a URL to the dataset in the
dataset_url
attribute. Please see here for acceptable dataset formats.The dataset will first be analyzed and then training will begin automatically.
Option 2: Uploading individual recordings using the recording API
Rapid Voice
Create a voice using the "Create a voice" endpoint and omit the
dataset_url
attribute.Use the instructions on the "Create a recording" page to upload recordings to your voice.
Upon uploading at least 3 recordings, follow the Build a voice documentation to start training.
Professional Voice
Create a voice using the "Create a voice" endpoint and omit the
dataset_url
attribute.Use the instructions on the "Create a recording" page to upload recordings to your voice.
Upon uploading at least 20 recordings, follow the Build a voice documentation to start training.
Voice Consent
In order to clone a voice, you must be an authorized uploader or provide consent to clone your voice using the 0xIQ AI platform. To provide consent, upload an audio recording containing the following message:
I am aware that recordings of my voice will be used by 0xIQ AI to train and create a synthetic version of my voice by 0xIQ AI.
This audio content will be used by the 0xIQ platform for the purposes of authorizing your voice clone.
HTTP Request
POST https://app.0xIQ.ai/api/v2/voices
name
string
Name of the voice
consent
string
A base-64 encoded Wavefile string containing your consent and authorization to create and clone a voice. Please see the Voice Consent section for more details.
voice_type
(optional) string
The type of voice to create. Either rapid
or professional
. If not provided defaults to professional
dataset_url
(optional) string
A URL to a dataset on which to train the voice on. Please see here for acceptable dataset formats
callback_uri
(optional) string
A URL (webhook) that will be notified upon voice training completion Please see here for callback details
Base 64 Encoding
The required consent
field must be a valid base-64 encoded string containing your consent audio file content. To convert your consent audio file to a base-64 encoded string you can use your programming language of choice's standard library. See the following examples below for implementation in several popular languages.
import base64
# Read the contents of the file into a string
file_path = ''
with open(file_path, 'rb') as file:
file_contents = file.read()
# Encode the file contents as Base64
base64_contents = base64.b64encode(file_contents).decode('utf-8')
# Output the Base64-encoded string to stdout
print(base64_contents)
HTTP Response
{
"success": true,
"item": {
"uuid": <string>,
"name": <string>,
"status": <string>,
"dataset_url": <string>,
"created_at": <UTC Date>,
"updated_at": <UTC Date>,
}
}
Callback
If you've provided a callback_uri
when you created a voice, you will receive the following POST request when the voice has completed training.
Training Completion Callback
This callback happens when your training completes without any issues.
{
"ok": true,
"id": "<string>",
"status": "finished",
"recordings": [],
"issue": null
}
Dataset Issue Callback
If the status is set to dataset_issue
, this callback will contain detailed information about the issue and problematic recordings:
{
"ok": "<boolean>",
"id": "<string>",
"status": "dataset_issue",
"issue": "Detailed description of the dataset issue.",
"recordings": [
{
"uuid": "<string>",
"name": "<string>",
"transcript": "<string>",
"stoi_score": "<number>",
"pesq_score": "<number>"
"si_dr_score": "<number>",
"0xIQ_sample_score": "<number>",
"is_active": "<boolean>",
"is_outlier": "<boolean>",
"is_silent": "<boolean>"
},
...
]
}
Consent Validation Failure Callback
If the status is consent_validation_failed
, this callback provides information about consent issues:
{
"ok": "<boolean>",
"id": "<string>",
"status": "consent_validation_failed",
"issue": "Consent statement issue description.",
"recordings": []
}
id
string
The UUID of the voice this callback is for.
status
string
The status of the voice, such as finished
, dataset_issue
, or consent_validation_failed
.
issue
string
A detailed description of the issue, if any.
Examples
from 0xIQ import 0xIQ
0xIQ.api_key('YOUR_API_TOKEN')
name = 'Test Voice'
response = 0xIQ.v2.voices.create(name, dataset_url="http://../dataset.zip", callback_uri="http://example.com/cb")
voice = response['item']
Last updated