3.2 Real-time DSP tips
Last updated
Last updated
In this section, we will motivate several real-time DSP tips and tricks that arise when considering a practical implementation, using our simple alien voice effect as an example.
As can be seen in Chapter 3.1, our effect is really simple: we only need to multiply the input samples with a sinusoid. However, directly computing the sine/cosine of a given value cannot be done by most computers and microcontrollers. Instead, they use a formula such as the Taylor series to approximate the value of a sine/cosine to a high degree of precision.
However, a quick glance at the Taylor series expansion of reveals that this can be an expensive operation, involving several multiplications:
A computationally cheap alternative is to use a lookup table (LUT). This consists of pre-computing the sinusoid for many evenly distributed values. For our application, we can define the spacing between consecutive values by our sampling frequency.
Lookup tables are extremely useful in DSP and can even be used to replace/approximate computationally expensive filters. However, they come with the cost of having to store them in memory so this tradeoff should always be carefully considered!
Below is a Python script, which you can also find in the repository, to compute a sinusoid lookup table given a particular sampling frequency and modulation frequency. The lookup table is then printed in the console for copy-and-pasting to a C program. The lookup table samples are also plotted.
Below is the plot visualizing the samples from the lookup table for a modulation frequency of 400 Hz and a sampling frequency of 32 kHz.
However, as we explained in the passthrough chapter, we receive (and process) the incoming audio in chunks (called "buffers"), which consist of multiple "frames". Moreover, each frame consists of one sample per channel. We will refer to the number of frames in each buffer as the "buffer length" and the total number of samples (as one frame could have multiple samples) as the "buffer size".
Imagine that we have 128 frames per buffer and two channels: so the "buffer length" is half the size of the "buffer size". For the first buffer we receive, i.e. the first 128 samples, the voice effect computation is straightforward:
where \textrm{LOOKUP_SIZE} is the number of entries in our lookup table. However, the second and subsequent buffers require us to know the current time index. We cannot simply multiply the input signal with \sin(\omega_{mod} \cdot [n \textrm{ \% LOOKUP_SIZE}]), n \in [0, 127] at each buffer as this would result in multiplying our input signal with a discontinuous estimate of our sinusoid. In the figure below, we can observe how our input signal could be multiplied with a discontinuous estimate of a sinusoid if information is not passed between buffers. Such an operation would lead to glitches in the output audio, which have a very noticeable "clicking" sound.
Figure: Discontinuous sinusoid estimate (blue, right-side up triangles) across consecutive buffers. For the new buffer, the discontinuous estimate simply starts at the beginning of the lookup table rather than continuing along the lookup table (green, upside-down triangles).
One solution would be to keep track of the number of buffers processed thus far so that we could multiply with the appropriate time index as such:
A better solution would be to keep track of our current "location", i.e. index, in the sinusoid lookup table. This way, in between buffers we know which is the last index in the lookup table we used so that we can use the appropriate offset in the processing of the new buffer. Below is the corresponding pseudocode for this approach (do not copy this to Eclipse!):
These values that we keep track of in between buffers, such as a pointer to the lookup table, are commonly referred to as state variables. For our applications in C, we recommend using static variables for state variables inside the process
function, as static variables keep their values between consecutive invocations inside a function.
More about this tradeoff can be read here and here.
Up until now, we assumed that the signal from the microphone is centered around zero, i.e. that no signal corresponds to an amplitude of zero. However, this is not always the case! During audio capture, the internal circuitry in the microphone may add an offset, and sometimes different microphones (of the same manufacturer) will have different offsets. We typically call this shift in the waveform a DC offset/noise/bias.
For our alien voice effect, a DC offset would result in a constant sinusoid (at our modulation frequency) present in the output signal. This is easy to see by adding the DC offset to the signal we saw before:
TASK 1: From your passthrough implementation, determine the value of the offset. Is it significant compared to the range of the microphone?
Hint: put a breakpoint in the process function; then with the debug tool, check the content of the input buffer.
To remove the DC noise, we could simply subtract the average offset of the microphone from every sample. This is a fast solution however, it assumes that the offset is always the same. This may be the case for a single microphone, but imagine calibrating hundreds, thousands, or even millions of microphones! This solution would certainly not scale well.
To avoid calibration we will implement a simple high pass filter. This filter will remove any DC component of the signal, i.e. bin number 0 of the Discrete Fourier Transform (DFT). We propose to use a cheap high pass filter of the following form:
This type of filter is typically called a comb filter. Equations like above are often referred to as difference equations in DSP. In order to understand the "quality" of a filter, it is often useful to analyze the frequency response of a difference equation by taking its Z-transform:
from which we can obtain the transfer function:
From such an expression, we can create the standard pole-zero plot as seen below. With such a plot, we can extract a lot of useful information, such as stability and causality.
Figure: Pole-zero plot of our simple high pass filter. Thanks to this software for the visualization function.
For certain system, we can also compute the Fourier Transform, which may be more intuitive to understand. For our simple high pass filter, the frequency response is shown below.
Figure: Frequency response of our simple high pass filter.
In addition to its simplicity, another good property of this filter is that it has linear phase, which means that each frequency will be delayed by the same amount of time.
It is actually more common to plot the frequency response with the x-axis (frequency) in log scale, as shown below.
Figure: Frequency response of our simple high pass filter (log scale).
With this perspective, we get a better idea of the filter slope/roll-off. In this case we have a roll-off of 18 dB/decade, where a decade is 10x increase in frequency. In audio, it is sometime preferred to specify the roll-off in dB/octave, where an octave is 2x increase in frequency. Our simple high pass filter has a roll-off of 5.4 db/octave. See here for a discussion on audio roll-off values.
From the frequency response in the figures above, we can observe how applying this high pass filter will significantly attenuate the DC offset. However, due to the simplicity of our chosen filter we will also attenuate frequencies in our range of interest; the human voice is roughly within the range of 300 Hz to 3400 Hz, see here. A much sharper filter is certainly more desirable but for the purpose (and simplicity) of our exercise this filter will suffice.
Now that we are subtracting the previous sample from the current sample, we will need to introduce another state variable for when we are at the beginning of the buffer. The resulting code is shown below:
It is nice to say that we do real-time processing, but is it really the case?
In order to have a finer timebase, we will use a timer (a more in-depth explanation of the STM timer can be found here). It is an internal peripheral of the microcontroller and it can be used for a lot of applications (PWM generation, count of internal or external events, etc.).
Timers always have an input clock with one of the timebases of the microcontroller internal clocks (a quartz for example). This timebase can be either taken directly or reduced by a factor called a prescaler. It is important to chose an appropriate prescaler value as it will define how fast the timer counts. Other important parameters include the length of the timer's counting register and at what value it will be reset. For our application, we will use a timer with a large counting capacity (32 bits) and we will set it to increment itself every microsecond.
One thing that you might have noticed from the passthrough example is that the output signal is not very loud. To correct this, we will add a small gain to the process
function by just multiplying the signal with a constant. In order to take advantage of the architecture of the microcontroller's internal multiplier, it is recommended to use factors that are a multiple of 2 as it is faster to compute. In fact a multiplication by 2 is simply a shift to the left in the "binary world"; similar to how a multiplication by 10 in the "decimal world" is simply adding a 0 at the end.
One advantage of our simple alien voice effect is that it does not require us to store past input or output values. In other words, the current output only depends on the current input .
where is the number of buffers processed so far. Below is the corresponding pseudocode (do not copy this to Eclipse!):
This works but the range of values could assume is technically unbounded if we never stop processing buffers! It would be more feasible to consider a value that has a limited range of possible values.
Operations with float
variables can take significantly more time than the same operations with int
variables. For our application, we noticed that an implementation with float
variables can take up to % more processing time! Therefore, we recommend avoiding float
variable whenever possible!
When using integer values though it is not possible, for example, to code a lookup table that goes from to with increments or to use filter coefficients with values within and . Therefore, to maximize our precision and to minimize the computation cost, we will try to use the full range of our integer variables. For example in the case of unsigned int 16
. This scaling factor will need to be incorporated whenever using the, e.g. lookup table or filter coefficients. With an intelligent use of operation priority (for example multiply before dividing in order to perform integer arithmetic without losing precision), it will not impact our precision and processing time.
where are the samples of our desired alien voice effect and is the level of the DC noise.
To assess this important question we will add a tool to our system, which will measure how long it takes to process one buffer with our application. In this way, we will know if we are processing too slow according to the chosen kHz sampling frequency and the selected buffer length.
The HAL
library includes a function uint32_t HAL_GetTick(void);
which will return the number of ticks since the start of the microcontroller in milliseconds. Sadly, we cannot use this tool because the resolution of one millisecond is too large for our selected sampling frequency ( MHz ).