Create a voice

There are two types of voices you can create on the 0xIQ platform: Rapid Voice Clone and Professional Voice Clone.

Rapid Voice Clonearrow-up-right

A Rapid Voice Clone is a quick and easy way to create a voice for your content. Using as little as 10 seconds of recordings, you can create a voice clone in under a minute.

Professional Voice Clonearrow-up-right

A Professional Voice Clone provides a more accurate way of creating a voice. It requires at least 10 minutes of recordings and takes around 40 minutes to create. This allows for a more detailed and personalized voice, as the AI has more data to work with.

There are 2 ways to provide data for a voice:

  1. Providing a URL to a dataset when creating the voice

  2. Uploading individual recordings using the recording API

Option 1: Providing a URL to a dataset when creating the voicearrow-up-right

Rapid Voicearrow-up-right

  1. Create a voice using the "Create a voice" endpoint and provide a URL to the dataset in the dataset_url attribute. The dataset must be a wav file of at least 10 seconds.

  2. After creating the voice, follow the Build a voicearrow-up-right documentation to start training.

Professional Voicearrow-up-right

  1. Create a voice using the "Create a voice" endpoint and provide a URL to the dataset in the dataset_url attribute. Please see here for acceptable dataset formatsarrow-up-right.

  2. The dataset will first be analyzed and then training will begin automatically.

Option 2: Uploading individual recordings using the recording APIarrow-up-right

Rapid Voicearrow-up-right

  1. Create a voice using the "Create a voice" endpoint and omit the dataset_url attribute.

  2. Use the instructions on the "Create a recording" page to upload recordings to your voice.

  3. Upon uploading at least 3 recordings, follow the Build a voicearrow-up-right documentation to start training.

Professional Voicearrow-up-right

  1. Create a voice using the "Create a voice" endpoint and omit the dataset_url attribute.

  2. Use the instructions on the "Create a recording" page to upload recordings to your voice.

  3. Upon uploading at least 20 recordings, follow the Build a voicearrow-up-right documentation to start training.

In order to clone a voice, you must be an authorized uploader or provide consent to clone your voice using the 0xIQ AI platform. To provide consent, upload an audio recording containing the following message:

I am aware that recordings of my voice will be used by 0xIQ AI to train and create a synthetic version of my voice by 0xIQ AI.

This audio content will be used by the 0xIQ platform for the purposes of authorizing your voice clone.

HTTP Requestarrow-up-right

JSON Body Parameters
Type
Description

name

string

Name of the voice

consent

string

A base-64arrow-up-right encoded Wavefile string containing your consent and authorization to create and clone a voice. Please see the Voice Consentarrow-up-right section for more details.

voice_type

(optional) string

The type of voice to create. Either rapid or professional. If not provided defaults to professional

dataset_url

(optional) string

A URL to a dataset on which to train the voice on. Please see here for acceptable dataset formatsarrow-up-right

callback_uri

(optional) string

A URL (webhook) that will be notified upon voice training completion Please see here for callback detailsarrow-up-right

Base 64 Encodingarrow-up-right

The required consent field must be a valid base-64 encoded string containing your consent audio file content. To convert your consent audio file to a base-64 encoded string you can use your programming language of choice's standard library. See the following examples below for implementation in several popular languages.

HTTP Responsearrow-up-right

If you've provided a callback_uri when you created a voice, you will receive the following POST request when the voice has completed training.

Training Completion Callbackarrow-up-right

This callback happens when your training completes without any issues.

Dataset Issue Callbackarrow-up-right

If the status is set to dataset_issue, this callback will contain detailed information about the issue and problematic recordings:

If the status is consent_validation_failed, this callback provides information about consent issues:

JSON Body Parameters
Type
Description

id

string

The UUID of the voice this callback is for.

status

string

The status of the voice, such as finished, dataset_issue, or consent_validation_failed.

issue

string

A detailed description of the issue, if any.

recordings

array

The recordings array provides detailed feedback for each problematic recording, including scores for STOIarrow-up-right, PESQarrow-up-right, and SI-SDRarrow-up-right, as well as flags indicating whether the recording is active, an outlier, or silent.

Examples

Last updated