Create a voice

Voice types

There are two types of voices you can create on the 0xIQ platform: Rapid Voice Clone and Professional Voice Clone.

Rapid Voice Clone

A Rapid Voice Clone is a quick and easy way to create a voice for your content. Using as little as 10 seconds of recordings, you can create a voice clone in under a minute.

Professional Voice Clone

A Professional Voice Clone provides a more accurate way of creating a voice. It requires at least 10 minutes of recordings and takes around 40 minutes to create. This allows for a more detailed and personalized voice, as the AI has more data to work with.

Voice Data

There are 2 ways to provide data for a voice:

  1. Providing a URL to a dataset when creating the voice

  2. Uploading individual recordings using the recording API

Option 1: Providing a URL to a dataset when creating the voice

Rapid Voice

  1. Create a voice using the "Create a voice" endpoint and provide a URL to the dataset in the dataset_url attribute. The dataset must be a wav file of at least 10 seconds.

  2. After creating the voice, follow the Build a voice documentation to start training.

Professional Voice

  1. Create a voice using the "Create a voice" endpoint and provide a URL to the dataset in the dataset_url attribute. Please see here for acceptable dataset formats.

  2. The dataset will first be analyzed and then training will begin automatically.

Option 2: Uploading individual recordings using the recording API

Rapid Voice

  1. Create a voice using the "Create a voice" endpoint and omit the dataset_url attribute.

  2. Use the instructions on the "Create a recording" page to upload recordings to your voice.

  3. Upon uploading at least 3 recordings, follow the Build a voice documentation to start training.

Professional Voice

  1. Create a voice using the "Create a voice" endpoint and omit the dataset_url attribute.

  2. Use the instructions on the "Create a recording" page to upload recordings to your voice.

  3. Upon uploading at least 20 recordings, follow the Build a voice documentation to start training.

In order to clone a voice, you must be an authorized uploader or provide consent to clone your voice using the 0xIQ AI platform. To provide consent, upload an audio recording containing the following message:

I am aware that recordings of my voice will be used by 0xIQ AI to train and create a synthetic version of my voice by 0xIQ AI.

This audio content will be used by the 0xIQ platform for the purposes of authorizing your voice clone.

HTTP Request

POST https://app.0xIQ.ai/api/v2/voices
JSON Body Parameters
Type
Description

name

string

Name of the voice

consent

string

A base-64 encoded Wavefile string containing your consent and authorization to create and clone a voice. Please see the Voice Consent section for more details.

voice_type

(optional) string

The type of voice to create. Either rapid or professional. If not provided defaults to professional

dataset_url

(optional) string

A URL to a dataset on which to train the voice on. Please see here for acceptable dataset formats

callback_uri

(optional) string

A URL (webhook) that will be notified upon voice training completion Please see here for callback details

Base 64 Encoding

The required consent field must be a valid base-64 encoded string containing your consent audio file content. To convert your consent audio file to a base-64 encoded string you can use your programming language of choice's standard library. See the following examples below for implementation in several popular languages.

import base64

# Read the contents of the file into a string
file_path = ''

with open(file_path, 'rb') as file:
    file_contents = file.read()

# Encode the file contents as Base64
base64_contents = base64.b64encode(file_contents).decode('utf-8')

# Output the Base64-encoded string to stdout
print(base64_contents)

HTTP Response

{
  "success": true,
  "item": {
    "uuid": <string>,
    "name": <string>,
    "status": <string>,
    "dataset_url": <string>,
    "created_at": <UTC Date>,
    "updated_at": <UTC Date>,
  }
}

Callback

If you've provided a callback_uri when you created a voice, you will receive the following POST request when the voice has completed training.

Training Completion Callback

This callback happens when your training completes without any issues.

{
    "ok": true,
    "id": "<string>",
    "status": "finished",
    "recordings": [],
    "issue": null
}

Dataset Issue Callback

If the status is set to dataset_issue, this callback will contain detailed information about the issue and problematic recordings:

{
  "ok": "<boolean>",
  "id": "<string>",
  "status": "dataset_issue",
  "issue": "Detailed description of the dataset issue.",
  "recordings": [
    {
      "uuid": "<string>",
      "name": "<string>",
      "transcript": "<string>",
      "stoi_score": "<number>",
      "pesq_score": "<number>"
      "si_dr_score": "<number>",
      "0xIQ_sample_score": "<number>",
      "is_active": "<boolean>",
      "is_outlier": "<boolean>",
      "is_silent": "<boolean>"
    },
    ...
  ]
}

If the status is consent_validation_failed, this callback provides information about consent issues:

{
    "ok": "<boolean>",
    "id": "<string>",
    "status": "consent_validation_failed",
    "issue": "Consent statement issue description.",
    "recordings": []
}
JSON Body Parameters
Type
Description

id

string

The UUID of the voice this callback is for.

status

string

The status of the voice, such as finished, dataset_issue, or consent_validation_failed.

issue

string

A detailed description of the issue, if any.

recordings

array

The recordings array provides detailed feedback for each problematic recording, including scores for STOI, PESQ, and SI-SDR, as well as flags indicating whether the recording is active, an outlier, or silent.

Examples

from 0xIQ import 0xIQ
0xIQ.api_key('YOUR_API_TOKEN')
  
name = 'Test Voice'
  
response = 0xIQ.v2.voices.create(name, dataset_url="http://../dataset.zip", callback_uri="http://example.com/cb")
voice = response['item']

Last updated