Advances in Speech and Audio Coding Using Generative AI

Conventional Techniques and Existing Solutions

Traditional speech coding techniques based on digital signal processing tools like linear prediction and frequency decomposition have long been the cornerstone of digital speech communication. These methods aim to reduce the data rate required for transmission while maintaining intelligibility and naturalness. Existing solutions, such as the 3GPP EVS standard to which Fraunhofer IIS has made major contributions, are highly effective. However, they face inherent limitations that prevent them from achieving natural quality at very low bit-rates in bandwidth-limited environments.

 

AI in Speech Coding

Our laboratory is exploring the possibility of overcoming these limitations by integrating generative artificial intelligence (AI) into speech coding schemes. We have contributed to groundbreaking advancements in speech synthesis through Generative Adversarial Networks (GANs), delivering unprecedented quality in Text-To-Speech (TTS) and achieving impressive compression of speech. Learn more about our work on TTS with GANs.

We are also investigating discrete representation of speech through learning, leading us to propose the state-of-the-art end-to-end speech coding model, NESC. Explore NESC here.

In addition, we focus on optimizing essential tasks associated with speech communication, such as AI-driven packet loss resilience tools and joint source-channel coding. Discover our packet loss resilience tools.

Our research also extends to generic audio compression with AI, including music, and to new generative AI methods for signal processing.

Author: Guillaume Fuchs

Related Posts