Blindly counting the number of speech sources (talkers) in a meeting room can be a difficult task. This paper was presented at HSCMA 2017 at the Google Offices in San Francisco and shows how using coherent-to-diffuse ratios could allow real-time source counting.
S. Pasha, J. Donley, C. Ritz, and Y. X. Zou, “Towards Real-Time Source Counting by Estimation of Coherent-to-Diffuse Ratios from Ad-Hoc Microphone Array Recordings,” presented at the 5th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, San Francisco, USA, 2017, pp. 161–165. 10.1109/HSCMA.2017.7895582
You can download the pre-published copy for free here.
Coherent-to-diffuse ratio (CDR) estimates over short time frames are utilized for source counting using ad-hoc microphone arrays to record speech from multiple participants in scenarios such as a meeting. It is shown that the CDR estimates obtained at ad-hoc dual (two channel) microphone nodes, located at unknown locations within an unknown reverberant room, can detect time frames with more than one active source and are informative for source counting applications. Results show that interfering sources can be detected with accuracies ranging from 69% to 89% for delays ranging from 20 ms to 300 ms, with source counting accuracies ranged from 61% to 81% for two sources and the same range of delays.