Skip to main content

Overview

Destined Voice uses advanced neural TTS models to generate natural-sounding speech from any speaker in our library.

Single Synthesis

Generate audio for a single text:
const result = await client.ttsGeneration.synthesizeSpeechV1TtsSynthesizePost({
  speakerId: "speaker-uuid",
  text: "Hello, this is a test.",
});

console.log(result.audioUrl);
// https://destined-voice.s3.amazonaws.com/audio/xxx.wav

Batch Synthesis

Generate multiple audio files in one request:
const job = await client.ttsGeneration.batchSynthesizeV1TtsBatchPost({
  items: [
    { speakerId: "speaker-1", text: "First sentence." },
    { speakerId: "speaker-2", text: "Second sentence." },
    { speakerId: "speaker-3", text: "Third sentence." },
  ],
});

// Returns a job ID to track progress
console.log(job.jobId);

Audio Format

Generated audio uses these specifications:
PropertyValue
FormatWAV
Sample Rate24,000 Hz
Bit Depth16-bit
ChannelsMono

Character Limits

TierCharacters/RequestCharacters/Month
Starter5001,000
Pro2,000100,000
Enterprise5,0001,000,000

Usage Tracking

Monitor your character usage:
const usage = await client.users.getUsageV1UsersUsageGet();

console.log(usage);
// {
//   characters_used: 45000,
//   characters_limit: 100000,
//   requests_today: 150,
//   period_start: "2024-01-01",
//   period_end: "2024-01-31"
// }

Best Practices

Group multiple synthesis requests into batch jobs for better performance.
Clean and normalize text before synthesis. Remove special characters and format numbers as words.
Store generated audio URLs. Re-synthesis of the same text with the same speaker produces identical audio.
For text longer than the character limit, split into sentences and combine audio files.