Getting Started
GuidesTrained VoiceGetting Started

Getting Started with Voice Training

Learn to create AI voice clones from audio samples

Create unlimited content with your voice

Train once, use forever. Upload a short audio sample and generate unlimited text-to-speech content in your own voice.

See Voice Training in Action

Watch the complete voice training process from audio upload to generating speech with your cloned voice.

Loading video...

What is Voice Training?

Voice training creates an AI model of your unique voice from a short audio sample. Once trained, you can generate unlimited speech in your voice by simply typing text - no more recording needed.

Simple 4-Step Process

Upload Audio Sample

Record or upload 15-30 seconds of clear speech

2 minutes

Configure Voice Settings

Name your voice and set system prompt for tone control

1 minute

Generate Preview

Test your voice clone with sample text

30 seconds

Save Trained Voice

Save your voice for use in video generation

Instant

Audio Requirements for Best Results

The quality of your audio sample directly affects your voice clone's accuracy and naturalness.

Duration

15-30 seconds recommended

Provides enough sample data without being too long. 10 seconds minimum required.

✅ Good Example:

20 seconds of natural speech

❌ Avoid:

5 seconds or 2+ minutes

Quality

Clear, noise-free recording

Background noise affects voice cloning accuracy and naturalness.

✅ Good Example:

Quiet room, phone/headset mic

❌ Avoid:

Outdoor recording, music playing

Content

Natural conversational speech

AI learns your natural speaking patterns and tone for realistic cloning.

✅ Good Example:

Read a paragraph naturally

❌ Avoid:

Shouting, whispering, robotic tone

File Format

MP3, WAV, or M4A under 50MB

Compressed formats work fine for voice training algorithms.

✅ Good Example:

MP3 from voice memo app

❌ Avoid:

Heavily compressed or corrupted files

What You Can Do with Trained Voices

Once your voice is trained, you can use it for unlimited text-to-speech generation across different content types.

Video Narration

Use your trained voice for video voice-overs without recording

Perfect for:

  • YouTube explainer videos
  • Course content narration
  • Product demo voice-overs

Key Benefit: Consistent voice across all videos

Text-to-Speech Content

Convert written content to audio in your own voice

Perfect for:

  • Blog post audio versions
  • Podcast episode creation
  • Audiobook narration

Key Benefit: Scale content creation efficiently

Personalized Messages

Create custom audio messages for different audiences

Perfect for:

  • Customer service responses
  • Personalized marketing messages
  • Educational content delivery

Key Benefit: Personal touch without recording each time

Complete Training Process

Here's exactly what happens during voice training and what you can expect.

1

Upload Audio Sample

Record or upload 15-30 seconds of clear speech

  • Use a quiet environment with minimal background noise
  • Speak naturally and clearly at normal volume
  • Support for MP3, WAV, M4A files (max 50MB)
2 minutes
2

Configure Voice Settings

Name your voice and set system prompt for tone control

  • Choose descriptive name (e.g., 'Professional Narrator')
  • Set system prompt to define speaking style
  • Default: 'naturally and clearly while excited'
1 minute
3

Generate Preview

Test your voice clone with sample text

  • Enter custom text to test voice quality
  • AI generates preview using your voice sample
  • Listen and verify it sounds like you want
30 seconds
4

Save Trained Voice

Save your voice for use in video generation

  • Voice becomes available in video generator
  • Can be used for unlimited text-to-speech
  • Manage and delete from your voice library
Instant

Pro Tips for Success

Best Practices

Record in a quiet environment with minimal echo

Speak naturally - don't try to sound different

Use descriptive system prompts for different tones

Test with different text lengths to verify quality

Common Mistakes

Recording in noisy environments affects quality

Speaking too fast or too slow sounds unnatural

Using very short samples (under 10 seconds)

Not testing the voice with various text types

Ready to Train Your First Voice?

Now you understand voice training basics. Start with the voice training process or learn advanced techniques for better results.