• Buradasın

    Tortoise TTS v2 Overview

    huggingface.co/jbetker/tortoise-tts-v2

    Yapay zekadan makale özeti

    Core Features
    • Multi-voice text-to-speech program with high realism and prosody
    • Uses both autoregressive and diffusion decoders with low sampling rates
    • Generates medium-sized sentences every 2 minutes on K80 GPU
    New Features in v2.1
    • Added random voice generation capability
    • Allows downloading and using user-provided voice conditioning latents
    • Enables using custom pretrained models
    • Refactored directory structures and improved performance
    Usage and Limitations
    • Requires NVIDIA GPU for local installation
    • Works best with books and poetry, struggles with other speech types
    • Training dataset limited to audiobooks, lacks diverse voices
    • Includes classifier to detect if audio was generated by Tortoise
    Technical Details
    • Built on 5 separate models trained on 50k hours of speech data
    • Inspired by OpenAI's DALLE with improved decoder
    • Currently 20x smaller than original DALLE transformer
    • Training methodology and configurations not yet released

    Yanıtı değerlendir

  • Yazeka sinir ağı makaleleri veya videoları özetliyor