Using only two microphones, like those commonly found on mobile devices, we show in this work how to count the number of people talking in a meeting scenario. This paper has been presented at and published in the proceedings of the 2017 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) in Kuala Lumpur, Malaysia.
S. Pasha, J. Donley and C. Ritz, “Blind Speaker Counting in Highly Reverberant Environments by Clustering Coherence Features,”presented at the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1-4, 2017.
You can download the pre-published copy for free here.
Source code is available from here.
This paper proposes the use of the frequency domain Magnitude Squared Coherence (MSC) between two ad-hoc recordings of speech as a reliable speaker discrimination feature for source counting applications in highly reverberant environments. The proposed source counting method does not require knowledge of the microphone spacing and does not assume any relative distance between the sources and the microphones. Source counting is based on clustering the frequency domain MSC of the speech signals derived from short time segments. Experiments show that the frequency domain MSC is speaker-dependent and the method was successfully used to obtain highly accurate source counting results for up to six active speakers for varying levels of reverberation and microphone spacing.