🎤 AI 语音日报

2026年2月27日星期五

📰 AI 前沿资讯 0

暂无更新

🎤 语音前沿论文 6

来源: arXiv eess.AS, cs.SD(标题保留英文原文)

TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition

cs.SD 👤 Cheng-Yeh Yang, Chien-Chun Wang, Li-Wei Chen
Low-resource automatic speech recognition (ASR) continues to pose significant challenges, primarily due to the limited availability of transcribed data for numerous languages.
📄 下载 PDF

EmoOmni: Bridging Emotional Understanding and Expression in Omni-Modal LLMs

cs.SD 👤 Wenjie Tian, Zhixian Zhao, Jingbin Hu
The evolution of Omni-Modal Large Language Models~(Omni-LLMs) has revolutionized human--computer interaction, enabling unified audio-visual perception and speech response.
📄 下载 PDF

A Knowledge-Driven Approach to Music Segmentation, Music Source Separation and Cinematic Audio Source Separation

eess.AS 👤 Chun-wei Ho, Sabato Marco Siniscalchi, Kai Li
We propose a knowledge-driven, model-based approach to segmenting audio into single-category and mixed-category chunks with applications to source separation.
📄 下载 PDF

iMiGUE-Speech: A Spontaneous Speech Dataset for Affective Analysis

eess.AS 👤 Sofoklis Kakouros, Fang Kang, Haoyu Chen
This work presents iMiGUE-Speech, an extension of the iMiGUE dataset that provides a spontaneous affective corpus for studying emotional and affective states.
📄 下载 PDF

UniWhisper: Efficient Continual Multi-task Training for Robust Universal Audio Representation

cs.SD 👤 Yuxuan Chen, Peize He, Haoyuan Xu
A universal audio representation should capture fine-grained speech cues and high-level semantics for environmental sounds and music in a single encoder.
📄 下载 PDF

Robust Long-Form Bangla Speech Processing: Automatic Speech Recognition and Speaker Diarization

cs.SD 👤 MD. Sagor Chowdhury, Adiba Fairooz Chowdhury
We describe our end-to-end system for Bengali long-form speech recognition (ASR) and speaker diarization submitted to the DL Sprint 4.0 competition on Kaggle.
📄 下载 PDF

👥 关注博主动态 0

暂无更新