3.2 Real-time DSP tips

In this section, we will motivate several real-time DSP tips and tricks that arise when considering a practical implementation, using our simple alien voice effect as an example.

Lookup table

As can be seen in Chapter 3.1, our effect is really simple: we only need to multiply the input samples with a sinusoid. However, directly computing the sine/cosine of a given value cannot be done by most computers and microcontrollers. Instead, they use a formula such as the Taylor series to approximate the value of a sine/cosine to a high degree of precision.

A computationally cheap alternative is to use a lookup table (LUT). This consists of pre-computing the sinusoid for many evenly distributed values. For our application, we can define the spacing between consecutive values by our sampling frequency.

Lookup tables are extremely useful in DSP and can even be used to replace/approximate computationally expensive filters. However, they come with the cost of having to store them in memory so this tradeoff should always be carefully considered!

Below is a Python script, which you can also find in the repository, to compute a sinusoid lookup table given a particular sampling frequency and modulation frequency. The lookup table is then printed in the console for copy-and-pasting to a C program. The lookup table samples are also plotted.

from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np

"""
Compute lookup table
"""
data_type = 16    # 16 or 32 signed integer
samp_freq = 32000
f_sine = 400

# periods
samp_per = 1./samp_freq
sine_per = 1./f_sine

# compute time instances
t_vals = np.arange(0, sine_per*1, samp_per)
LOOKUP_SIZE = len(t_vals)
n_vals = np.arange(LOOKUP_SIZE)

# compute the sine table
MAX_SINE = 2**(data_type-1)-1   # [-(2*data_type-1), 2**(data_type-1)]
w_mod = 2*np.pi*(f_sine/samp_freq)
sine_table = np.sin(w_mod*n_vals) * MAX_SINE
if data_type == 16:
    sine_table = sine_table.astype(np.int16)
    print_type = np.uint16
    print_format = '0x%04x'
elif data_type == 32:
    sine_table = sine_table.astype(np.int32)
    print_type = np.uint32
    print_format = '0x%08x'
else:
    raise ValueError("Invalid data type!")

"""
Print C code.
"""
print('#define SINE_TABLE_SIZE', str(LOOKUP_SIZE))
print('#define SIN_MAX', (print_format % MAX_SINE))
if data_type == 16:
    print('const int16_t sine_table[SINE_TABLE_SIZE] = {')
elif data_type == 32:
    print('const int32_t sine_table[SINE_TABLE_SIZE] = {')
print(','.join([print_format % i.astype(print_type) for i in sine_table]))
print('};')

"""
Visualize
"""
plt.figure()
plt.stem(n_vals, sine_table)
plt.grid()
plt.autoscale(enable=True, axis='x', tight=True)
plt.xlabel("Index")
plt.title("Sine table for %d Hz at %d Hz sampling rate" % (f_sine, samp_freq))
plt.show()

Below is the plot visualizing the samples from the lookup table for a modulation frequency of 400 Hz and a sampling frequency of 32 kHz.

State variables

However, as we explained in the passthrough chapter, we receive (and process) the incoming audio in chunks (called "buffers"), which consist of multiple "frames". Moreover, each frame consists of one sample per channel. We will refer to the number of frames in each buffer as the "buffer length" and the total number of samples (as one frame could have multiple samples) as the "buffer size".

Imagine that we have 128 frames per buffer and two channels: so the "buffer length" is half the size of the "buffer size". For the first buffer we receive, i.e. the first 128 samples, the voice effect computation is straightforward:

y[n] = x[n] \cdot \sin(\omega_{mod} \cdot [n \textrm{ \% LOOKUP_SIZE}]), \quad n \in [0, 127],

where \textrm{LOOKUP_SIZE} is the number of entries in our lookup table. However, the second and subsequent buffers require us to know the current time index. We cannot simply multiply the input signal with \sin(\omega_{mod} \cdot [n \textrm{ \% LOOKUP_SIZE}]), n \in [0, 127] at each buffer as this would result in multiplying our input signal with a discontinuous estimate of our sinusoid. In the figure below, we can observe how our input signal could be multiplied with a discontinuous estimate of a sinusoid if information is not passed between buffers. Such an operation would lead to glitches in the output audio, which have a very noticeable "clicking" sound.

Figure: Discontinuous sinusoid estimate (blue, right-side up triangles) across consecutive buffers. For the new buffer, the discontinuous estimate simply starts at the beginning of the lookup table rather than continuing along the lookup table (green, upside-down triangles).

One solution would be to keep track of the number of buffers processed thus far so that we could multiply with the appropriate time index as such:

y[128 \cdot B + n] = x[128 \cdot B + n] \cdot \sin(\omega_{mod} \cdot [(128 \cdot B + n) \textrm{ % LOOKUP_SIZE}]), \quad n \in [0, 127],
B = 0
for k in range(n_buffers):
    for n in range(n_frames):
        y[n] = x[n] * sine_table[(n_frames*B+n)%LOOKUP_SIZE]
    B += 1

A better solution would be to keep track of our current "location", i.e. index, in the sinusoid lookup table. This way, in between buffers we know which is the last index in the lookup table we used so that we can use the appropriate offset in the processing of the new buffer. Below is the corresponding pseudocode for this approach (do not copy this to Eclipse!):

sine_pointer = 0
while(True):    # can go "forever" without worrying about the value of our state variable as it will wrap around!
    for n in range(n_frames):
        y[n] = x[n] * sine_table[sine_pointer]
        sine_pointer += 1
        sine_pointer %= LOOKUP_SIZE     # limited to the range [0, LOOKUP_SIZE-1]

These values that we keep track of in between buffers, such as a pointer to the lookup table, are commonly referred to as state variables. For our applications in C, we recommend using static variables for state variables inside the process function, as static variables keep their values between consecutive invocations inside a function.

Float vs. Int

More about this tradeoff can be read here and here.

Removing DC noise

Up until now, we assumed that the signal from the microphone is centered around zero, i.e. that no signal corresponds to an amplitude of zero. However, this is not always the case! During audio capture, the internal circuitry in the microphone may add an offset, and sometimes different microphones (of the same manufacturer) will have different offsets. We typically call this shift in the waveform a DC offset/noise/bias.

For our alien voice effect, a DC offset would result in a constant sinusoid (at our modulation frequency) present in the output signal. This is easy to see by adding the DC offset to the signal we saw before:

TASK 1: From your passthrough implementation, determine the value of the offset. Is it significant compared to the range of the microphone?

Hint: put a breakpoint in the process function; then with the debug tool, check the content of the input buffer.

To remove the DC noise, we could simply subtract the average offset of the microphone from every sample. This is a fast solution however, it assumes that the offset is always the same. This may be the case for a single microphone, but imagine calibrating hundreds, thousands, or even millions of microphones! This solution would certainly not scale well.

To avoid calibration we will implement a simple high pass filter. This filter will remove any DC component of the signal, i.e. bin number 0 of the Discrete Fourier Transform (DFT). We propose to use a cheap high pass filter of the following form:

This type of filter is typically called a comb filter. Equations like above are often referred to as difference equations in DSP. In order to understand the "quality" of a filter, it is often useful to analyze the frequency response of a difference equation by taking its Z-transform:

from which we can obtain the transfer function:

From such an expression, we can create the standard pole-zero plot as seen below. With such a plot, we can extract a lot of useful information, such as stability and causality.

Figure: Pole-zero plot of our simple high pass filter. Thanks to this software for the visualization function.

For certain system, we can also compute the Fourier Transform, which may be more intuitive to understand. For our simple high pass filter, the frequency response is shown below.

Figure: Frequency response of our simple high pass filter.

In addition to its simplicity, another good property of this filter is that it has linear phase, which means that each frequency will be delayed by the same amount of time.

It is actually more common to plot the frequency response with the x-axis (frequency) in log scale, as shown below.

Figure: Frequency response of our simple high pass filter (log scale).

With this perspective, we get a better idea of the filter slope/roll-off. In this case we have a roll-off of 18 dB/decade, where a decade is 10x increase in frequency. In audio, it is sometime preferred to specify the roll-off in dB/octave, where an octave is 2x increase in frequency. Our simple high pass filter has a roll-off of 5.4 db/octave. See here for a discussion on audio roll-off values.

From the frequency response in the figures above, we can observe how applying this high pass filter will significantly attenuate the DC offset. However, due to the simplicity of our chosen filter we will also attenuate frequencies in our range of interest; the human voice is roughly within the range of 300 Hz to 3400 Hz, see here. A much sharper filter is certainly more desirable but for the purpose (and simplicity) of our exercise this filter will suffice.

Now that we are subtracting the previous sample from the current sample, we will need to introduce another state variable for when we are at the beginning of the buffer. The resulting code is shown below:

// x_1 is the state variable containing the previous sample
for (uint16_t i = 0; i < FRAME_PER_BUFFER; i++) {
    y[i] = x[i] - x_1;
}

Benchmarking implementation

It is nice to say that we do real-time processing, but is it really the case?

In order to have a finer timebase, we will use a timer (a more in-depth explanation of the STM timer can be found here). It is an internal peripheral of the microcontroller and it can be used for a lot of applications (PWM generation, count of internal or external events, etc.).

Timers always have an input clock with one of the timebases of the microcontroller internal clocks (a quartz for example). This timebase can be either taken directly or reduced by a factor called a prescaler. It is important to chose an appropriate prescaler value as it will define how fast the timer counts. Other important parameters include the length of the timer's counting register and at what value it will be reset. For our application, we will use a timer with a large counting capacity (32 bits) and we will set it to increment itself every microsecond.

Gain

One thing that you might have noticed from the passthrough example is that the output signal is not very loud. To correct this, we will add a small gain to the processfunction by just multiplying the signal with a constant. In order to take advantage of the architecture of the microcontroller's internal multiplier, it is recommended to use factors that are a multiple of 2 as it is faster to compute. In fact a multiplication by 2 is simply a shift to the left in the "binary world"; similar to how a multiplication by 10 in the "decimal world" is simply adding a 0 at the end.

Last updated