TTS Generation

Overview

Destined Voice uses advanced neural TTS models to generate natural-sounding speech from any speaker in our library.

Single Synthesis

Generate audio for a single text:

const result = await client.ttsGeneration.synthesizeSpeechV1TtsSynthesizePost({
  speakerId: "speaker-uuid",
  text: "Hello, this is a test.",
});

console.log(result.audioUrl);
// https://voice.s3.amazonaws.com/audio/xxx.wav

Batch Synthesis

Generate multiple audio files in one request:

const job = await client.ttsGeneration.batchSynthesizeV1TtsBatchPost({
  items: [
    { speakerId: "speaker-1", text: "First sentence." },
    { speakerId: "speaker-2", text: "Second sentence." },
    { speakerId: "speaker-3", text: "Third sentence." },
  ],
});

// Returns a job ID to track progress
console.log(job.jobId);

Audio Format

Generated audio uses these specifications:

Property	Value
Format	WAV
Sample Rate	24,000 Hz
Bit Depth	16-bit
Channels	Mono

Usage Tracking

Monitor your character usage:

const usage = await client.users.getUsageV1UsersUsageGet();

console.log(usage);
// {
//   characters_used: 45000,
//   characters_limit: 100000,
//   requests_today: 150,
//   period_start: "2024-01-01",
//   period_end: "2024-01-31"
// }

Best Practices

Batch similar requests

Group multiple synthesis requests into batch jobs for better performance.

Preprocess text

Clean and normalize text before synthesis. Remove special characters and format numbers as words.

Cache audio

Store generated audio URLs. Re-synthesis of the same text with the same speaker produces identical audio.

Handle long text

For text longer than the character limit, split into sentences and combine audio files.

Get Started

Core Concepts

SDKs

Overview

Single Synthesis

Batch Synthesis

Audio Format

Usage Tracking

Best Practices

Get Started

Core Concepts

SDKs

​Overview

​Single Synthesis

​Batch Synthesis

​Audio Format

​Usage Tracking

​Best Practices

Overview

Single Synthesis

Batch Synthesis

Audio Format

Usage Tracking

Best Practices