Expressive FastSpeech2

screenshot of Expressive FastSpeech2

PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.

Overview:

The Expressive-FastSpeech2 PyTorch Implementation is a project aimed at providing a foundation for non-autoregressive expressive Text-to-Speech (TTS) technology, including Emotional TTS and Conversational TTS. It focuses on utilizing datasets like AIHub Multimodal Video AI and IEMOCAP for Korean and English languages, respectively. The project delves into handling annotated data processing and expands to cover both English and Korean TTS, offering insights on training models with different languages and specific language features.

Features:

  • Non-autoregressive Expressive TTS: Provides a base for future research on Emotional and Conversational TTS.
  • Annotated Data Processing: Offers guidance on handling new datasets for successful training of non-autoregressive emotional TTS.
  • English and Korean TTS Support: Covers training for both languages with emphasis on language-specific features.
  • Adapting Other Languages: Provides instructions for training with datasets in languages other than English and Korean.

Summary:

The Expressive-FastSpeech2 PyTorch Implementation is a valuable resource for researchers and developers interested in non-autoregressive expressive Text-to-Speech technology. With support for Emotional TTS, Conversational TTS, and multiple languages, the project offers insights, datasets, and implementations to facilitate advancements in the field. By providing guidance on data processing and adapting models to different languages, this project serves as a significant contribution to the TTS research community.