Buradasın

Tortoise TTS v2 Overview

Yapay zekadan makale özeti

Core Features: Multi-voice text-to-speech program with high realism and prosody
Uses both autoregressive and diffusion decoders with low sampling rates
Generates medium-sized sentences every 2 minutes on K80 GPU
New Features in v2.1: Added random voice generation capability
Allows downloading and using user-provided voice conditioning latents
Enables using custom pretrained models
Refactored directory structures and improved performance
Usage and Limitations: Requires NVIDIA GPU for local installation
Works best with books and poetry, struggles with other speech types
Training dataset limited to audiobooks, lacks diverse voices
Includes classifier to detect if audio was generated by Tortoise
Technical Details: Built on 5 separate models trained on 50k hours of speech data
Inspired by OpenAI's DALLE with improved decoder
Currently 20x smaller than original DALLE transformer
Training methodology and configurations not yet released