In this work, the authors present a comprehensive methodology for multizone sound field reproduction using specially designed speech masking filters. The masking filters are designed to maximise speech privacy and quality. Trade-offs between speech privacy and quality are shown to exist and parameters are provided in the methods to control those trade-offs. An accurate and precise formulation of grating lobes from spatial aliasing in multizone reproduction scenarios is provided and used to enhance the masking filters. The mathematical descriptions and thorough methodology are evaluated using simulations and a real world implementation of a multizone sound field reproduction system.
J. Donley, C. Ritz, and W. B. Kleijn, “Multizone Soundfield Reproduction With Privacy- and Quality-Based Speech Masking Filters,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 26, no. 6, pp. 1041-1055, June 2018.
You can download the pre-published copy for free here.
Reproducing zones of personal sound is a challenging signal processing problem that has garnered considerable research interest in recent years. We introduce in this work an extended method to multizone soundfield reproduction that overcomes issues with speech privacy and quality. Measures of speech intelligibility contrast (SIC) and speech quality are used as cost functions in an optimization of speech privacy and quality. Novel spatial and (temporal) frequency domain speech masker filter designs are proposed to accompany the optimization process. Spatial masking filters are designed using multizone soundfield algorithms that are dependent on the target speech multizone reproduction. Combinations of estimates of acoustic contrast and long term average speech spectra are proposed to provide equal masking influence on speech privacy and quality. Spatial aliasing specific to multizone soundfield reproduction geometry is further considered in analytically derived low-pass filters. Simulated and real-world experiments are conducted to verify the performance of the proposed method using semi-circular and linear loudspeaker arrays. Simulated implementations of the proposed method show that significant SIC and speech quality is achievable between zones. A range of perceptual evaluation of speech quality mean opinion scores that indicate good quality are obtained while at the same time providing confidential privacy as indicated by SIC. The simulations also show that the method is robust to variations in the speech, virtual source location, array geometry, and number of loudspeakers. Real-world experiments confirm the practicality of the proposed methods by showing that good quality and confidential privacy are achievable.