There are a few robust algorithms available to objectively measure the intelligibility of speech. They compare what is heard by a listener to the information that was transmitted by the talker, however, only one of these available algorithms works in reverberant rooms and at the time I was looking for an implementation for MATLAB, I couldn’t find any, zilch. So, I decided to code it from scratch myself and will show you how to do it step by step.
The articulation index (AI) is an older measure which has evolved to the speech intelligibility index (SII). The SII uses a band-importance and band-audibility function to account for the lack of mutual information from masking and from varying presentation levels of the clean speech. There is also the short-time objective intelligibility (STOI) measure which is designed for the prediction of time-frequency weighted noisy speech and the speech transmission index (STI) which makes use of the modulation transfer function.
Now, this might all sound a little confusing but in a nutshell only the STI is suitable for measuring intelligibility in reverberant rooms. The STI in essence measures the response of the acoustic channel between the listener and the talker and from this measurement of the channel it estimates how much mutual information could be maintained across that channel. The reason it is reasonably robust to varying reverberation levels is because it analyses spectral band powers using the modulation transfer function (MTF).
Firstly, to measure the STI we will need the impulse response (IR) of the channel. This can be found a few ways for simulations and for real-world recordings. Some of these might be to transmit a valid signal, record the result and then deconvolve the recording with the original signal. Some signals that could be used are maximum length sequence (MLS), white noise, sine sweep, logarithmic sine sweep, popping of a balloon or anything else that goes bang. I will go over some nice methods in a future post.
Once we have a valid IR we will use this as the input to our MATLAB function which will compute the STI. I will be showing how to do this using the older version of the standard (still quite accurate for low presentation levels). I used this as my function definition:
function [STI_val, ALcons, STI_val_approx, ALcons_approx] = STI( ImpulseResponse, fs, OctaveFilters )
which is ready for some extra features which I will explain later.
Now, before we start to calculate the STI we want to set up a second function to find the MTF because to find the STI we must do this many many times. So I added another function definition after the
end keyword for the first function and used this definition:
function MTF = getMTF(ir)
So, to find the MTF for a given IR there are five simple steps which we need to add to the
getMTF function. The first step is to square the IR:
ir_2 = ir .^ 2;
We need to get the total energy of the squared IR at some point so may as well do it now. This can be done by integrating which is a summation of the discrete values of the IR array:
ir_2_int = sum(ir_2);
After this we need to transform the squared IR into the frequency domain. This can be done using a fast Fourier transform which MATLAB conveniently has a function for:
IR_2_fft = fft(ir_2);
Once we have our frequency domain signal (the envelope spectrum) we should normalise it based on the total energy we calculated earlier:
IR_MTF = IR_2_fft / ir_2_int;
Lastly, the MTF we want to return for use in our
STI function should only be the absolute value of the complex envelope spectrum. This can be done like so:
MTF = abs(IR_MTF);
That is the
end of the
Now we can get started on finding the actual STI value, the fun part. For this we need to run the entire IR (the full bandwidth IR) through octave band filters and then find the MTF of each of the filtered signals. To filter the IR with octave band filters we must create a filter for each octave band, then filter the IR and then save the result each time. This can be done with the following code:
BandsPerOctave = 1; N = 6; % Filter Order F0 = 1000; % Center Frequency (Hz) f = fdesign.octave(BandsPerOctave,'Class 1','N,F0',N,F0,fs); F0 = validfrequencies(f); F0(F0<125)=; F0(F0>min([8000,fs/2]))=; % Only keep bands in the range required for the STI calculation Nfc = length(F0); for i=1:Nfc if nargin < 3 f.F0 = F0(i); H = design(f,'butter'); ir_filtered = filter(H, ImpulseResponse); else ir_filtered = filter(OctaveFilters(i), ImpulseResponse); end MTF_octband(:,i) = getMTF(ir_filtered); end
If you read the code above you may have noticed the conditional statement that says that if we have three or more input arguments then we should use the third input argument,
OctaveFilters. I have added this because creating the octave band filters is computationally expensive, so if we end up calling our function many times it would be better to calculate the filters once and reuse them for each function call, I will show how this can be done in a future post. Also note the function call to our function,
Now that we have our octave band filtered signals we should get the amplitude at different modulation frequencies. For the STI there are 14 different modulation frequencies which have 1/3 octave spacing. That is to say, we should get the magnitude at 14 specific frequencies from each MTF. The following code will accomplish this:
modulation_freqs = (0.63) * ((2.0) .^ ([0:13]./3)); freqs = linspace(0, fs/2, size(MTF_octband,1)/2+1); freqs=freqs(1:end-1);%No nyquist frequency for i=1:Nfc for j=1:14 m(i,j) = interp1(freqs,MTF_octband(1:end/2,i),modulation_freqs(j)); end end good_freqs = ~any(isnan(m),2); m(~good_freqs,:)=;
The last couple of lines here are a little bit special, I have added them so that if we have any frequencies which return bad results they will be ignored.
The amplitude values we now have are known as the “m” values and the next step is to convert them into an “apparent” signal-to-noise ratio (SNR). This is how it is done in MATLAB:
SNR_apparent = 10*log10( m ./ (1-m) );
The STI says we should limit the apparent SNR to ±15dB. I did this like so:
SNR_apparent( SNR_apparent > 15 ) = 15; SNR_apparent( SNR_apparent < -15 ) = -15;
Once we have our apparent SNRs all we need to do is find the average for each of the octave bands. With our matrix of apparent SNR values we can do this quite simply in MATLAB:
SNR_avg = mean(SNR_apparent,2);
At this point we still have more than a single STI value, though.
To get these octave band SNR values into a single STI value we need to find a weighted average of them and then turn that weighted average into a value between 0 and 1. The weighted average means we need to know the different weights for each band, these are described in the STI standard but if you’re clever you can work them out from the following code:
W = [0.13, 0.14, 0.11, 0.12, 0.19, 0.17, 0.14]; SNR_Wavg = sum(SNR_avg' .* W(good_freqs)); SNR_Wavg_approx = sum(SNR_avg' .* W(good_freqs)/sum(W(good_freqs)));
I have also added a weighted average which is performed on only the good frequencies which we defined before and called it an approximation. This means that if, for instance, the sampling frequency is too low and we dont have enough octave bands to fulfill the proper STI standard, then we can still approximate what the result might be by ignoring the bands that are too high.
Of course, the rescaling needs to be done too:
STI_val = (SNR_Wavg + 15) / 30; STI_val_approx = (SNR_Wavg_approx + 15) / 30;
That’s it! but of course, I think it is nice to add just a couple more convenient return values before we end our function:
ALcons = 170.5405 * exp(-5.419*STI_val); ALcons_approx = 170.5405 * exp(-5.419*STI_val_approx);
Those calculations are for the articulation loss of consonants (ALcons) which is another useful value.
Ok, now we are done and the function can be ended with the
end keyword. If you have done it all properly you should have code that works like the one you can download from here. An up-to-date copy can also be found here.
How do you use your newly created STI function?
Easy, just pass your IR array and the sampling frequency (
fs) to the function:
y=sinc(-7999:8000); fs=44100; STIval=STI(y,fs)
Want the articulation loss of consonants instead? No problem:
What if the signal doesn’t meet the sampling requirements for the STI standard? Well, let’s approximate the STI and ALcons and compare them with the true standard-abiding values:
If you’d like to pass the octave filters to the function you can read over this code and try write a separate function to create them. I will go over this in a future post.
That’s it, you can now pass the impulse response of any audio channel through this function to find the STI value.