Audio Signal Processing
Audio compression techniques developed over more than 30 years by the Audio and Media Technologies division have been the cornerstone of many digital applications from speech transmission to audio streaming. However, these techniques face inherent limitations that prevent them from delivering natural, high-quality sound at ultra-low bitrates. By integrating generative AI into our audio coding systems and contributing to groundbreaking advances in learned representation and synthesis of speech and audio, we can overcome these limitations and pave the way for new applications.
For further information you can contact Markus Multrus at Fraunhofer IIS.
Publications
- On the Design of Diffusion-based Neural Speech CodecsFrom Understanding to Generation: An Efficient Shortcut for Evaluating Language Models
- Benchmarking Neural Speech Codec Intelligibility with SIToolStratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
- UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension
- On the Design of Diffusion-based Neural Speech Codecs
- Low-Resource Text-to-Speech Synthesis Using Noise-Augmented Training of ForwardTacotron
- Evaluation of Data-Driven Room Geometry Inference Methods Using a Smart Speaker Prototype
- Multi-Purpose Room Impulse Response Dataset Measured on a 3D Spatial Grid
- Analysis of Global and Local Average Room Transfer Functions Based on Measured Room Impulse Responses
- Estimating Frequency-dependent Absorption Coefficients in Small Rooms Using a Diffusion Model
- PAD-VC: A Prosody-Aware Decoder for Any-to-Few Voice Conversion
- Meta Learning Text-to-Speech Synthesis in over 7000 Languages
- Data-driven Joint Detection and Localization of Acoustic Reflectors
- Evaluating the Impact of Prosody Feature Normalization on the Controllability of Pitch in Speech Synthesis
- Analysis by Synthesis Assessment of Speech Emotion Perception in Different Languages
- Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation
- Improving the Naturalness of Synthesized Spectrograms for TTS Using GAN-based Post-Processing
- The AudioLabs System for the Blizzard Challenge 2023
- Evaluating Speech–Phoneme Alignment and its Impact on Neural Text-To-Speech Synthesis
- Data-Driven Local Average Room Transfer Function Estimation for Multi-Point Equalization
- PostGAN: A GAN-Based Post-Processor to Enhance the Quality of Coded Speech
- A DNN Based Post-Filter to Enhance the Quality of Coded Speech in MDCT Domain
- A Streamwise Gan Vocoder for Wideband Speech Coding at Very Low Bit Rate
- StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization
- A Lightweight Neural TTS System for High-quality German Speech Synthesis 2
- A Lightweight Neural TTS System for High-quality German Speech Synthesis 1
- Enhancement of Coded Speech Using a Mask-Based Post-Filter