Keynote Speakers

Tara N. Sainath, Google

Coming Soon.

Large Language-Audio Models and Applications

Prof. Wenwu Wang

Abstract: Large Language Models (LLMs) are being explored in audio processing to interpret and generate meaningful patterns from complex sound data, such as speech, music, environmental noise, sound effects, and other non-verbal audio. Combined with acoustic models, LLMs offer great potential for addressing a variety of problems in audio processing, such as audio captioning, audio generation, source separation, and audio coding. This talk will cover recent advancements in using LLMs to address audio-related challenges. Topics will include the language-audio models for mapping and aligning audio with textual data, their applications across various audio tasks, the creation of language-audio datasets, and potential future directions in language-audio learning. We will demonstrate our recent works in this area, for example, AudioLDM, AudioLDM2 and WavJourney for audio generation and storytelling, AudioSep for audio source separation, ACTUAL for audio captioning, SemantiCodec for audio coding, WavCraft for content creation and editing, and APT-LLMs for audio reasoning, and the datasets WavCaps, Sound-VECaps, and AudioSetCaps for training and evaluating large language-audio models.

Biography: Wenwu Wang is a Professor in Signal Processing and Machine Learning, University of Surrey, UK. He is also an AI Fellow at the Surrey Institute for People Centred Artificial Intelligence. His current research interests include signal processing, machine learning and perception, artificial intelligence, machine audition (listening), and statistical anomaly detection. He has (co)-authored over 300 papers in these areas. He has been recognized as a (co-)author or (co)-recipient of more than 15 accolades, including the 2022 IEEE Signal Processing Society Young Author Best Paper Award, ICAUS 2021 Best Paper Award, DCASE 2020 and 2023 Judge’s Award, DCASE 2019 and 2020 Reproducible System Award, and LVA/ICA 2018 Best Student Paper Award. He is an Associate Editor (2020-2025) for IEEE/ACM Transactions on Audio Speech and Language Processing, and an Associate Editor (2024-2026) for IEEE Transactions on Multimedia. He was a Senior Area Editor (2019-2023) and Associate Editor (2014-2018) for IEEE Transactions on Signal Processing. He is the elected Chair (2023-2024) of IEEE Signal Processing Society (SPS) Machine Learning for Signal Processing Technical Committee, a Board Member (2023-2024) of IEEE SPS Technical Directions Board, the elected Chair (2025-2027) and Vice Chair (2022-2024) of the EURASIP Technical Area Committee on Acoustic Speech and Music Signal Processing, an elected Member (2021-2026) of the IEEE SPS Signal Processing Theory and Methods Technical Committee. He has been on the organising committee of INTERSPEECH 2022, IEEE ICASSP 2019 & 2024, IEEE MLSP 2013 & 2024, and SSP 2009. He is Technical Program Co-Chair of IEEE MLSP 2025. He has been an invited Keynote or Plenary Speaker on more than 20 international conferences and workshops.

Towards Robust Audio Deepfake Detection and Attribution

Prof. Jianhua Tao

Abstract: Audio deepfake detection has attracted more and more attention. Although previous studies have made some attempts on audio deepfake detection and attribution, the generalization and robustness of the models are still poor when evaluated on the mismatching dataset containing multiple unseen attacks like VALL-E, GPT-4o etc. This talk will provide an overview of recent progress in audio deepfake detection and attribution, with a particular emphasis on how to improve robustness of the models making them more reliable in real-world applications. This talk will also provide a more comprehensive understanding the reasons of discrimination, helping users understand the detection process and building trust in anti-deepfake technologies.

Biography: Prof. Tao is a Professor of Tsinghua University. He was the Deputy Director of the National Laboratory of Pattern Recognition from 2014 to 2022, the Director of Sino-European Laboratory of Informatics, Automation and Applied Mathematics (LIAMA) from 2015 to 2022. Prof. Tao is a recognized scholar in the field of speech and language processing, multimodal human-computer interaction and affective computing. He was elected Chairperson of the ISCA SIG-CSLP (2019-2020) and was Technical Program Chair of INTERSPEECH2020. He is a Fellow of the China Computer Federation (CCF). He has published more than 300 papers in IEEE TPAMI, TASLP, TAC, PR, NIPS, ICML, AAAI, ICASSP, etc. His recent awards include the Award of Distinguished Young Scholars of NSFC (2014), Award of National special support program for high-level person (2018), Best Paper Awards of NCMMSC (2001, 2015, 2017), Best Paper Awards of CHCI (2011, 2013, 2015, 2016). He has delivered numerous invited and keynote talks, such as Speech Prosody (2012, 2018), NCMMSC (2017), etc. He was also an elected member of the Executive Committee of AAAC association (2007-2017) and served on the Steering Committee of IEEE Transactions on Affective Computing (2009-2017). He currently serves as the ISCA Board member, the Subject Editor of Speech Communication, the Editorial Board Member of Journal on Multimodal User Interfaces.

Prof. Mark Hasegawa-Johnson

University of Illinois.

Comming Soon.