3.2 Real-time DSP tips

Previous3.1 How it works Next3.3 Real-time with Python

Last updated 6 years ago

Was this helpful?

3.2 Real-time DSP tips

In this section, we will motivate several real-time DSP tips and tricks that arise when considering a practical implementation, using our simple alien voice effect as an example.

Lookup table

As can be seen in , our effect is really simple: we only need to multiply the input samples with a sinusoid. However, directly computing the sine/cosine of a given value cannot be done by most computers and microcontrollers. Instead, they use a formula such as the Taylor series to approximate the value of a sine/cosine to a high degree of precision.

However, a quick glance at the Taylor series expansion of $\sin(x)$ reveals that this can be an expensive operation, involving several multiplications:

\sin x = x - \dfrac{x^3}{3!} + \dfrac{x^5}{5!} - \dfrac{x^7}{7!} + \mathcal{O}(x^9)

A computationally cheap alternative is to use a (LUT). This consists of pre-computing the sinusoid for many evenly distributed values. For our application, we can define the spacing between consecutive values by our sampling frequency.

Lookup tables are extremely useful in DSP and can even be used to replace/approximate computationally expensive filters. However, they come with the cost of having to store them in memory so this tradeoff should always be carefully considered!

Below is a Python script, which you can also find , to compute a sinusoid lookup table given a particular sampling frequency and modulation frequency. The lookup table is then printed in the console for copy-and-pasting to a C program. The lookup table samples are also plotted.

from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np

"""
Compute lookup table
"""
data_type = 16    # 16 or 32 signed integer
samp_freq = 32000
f_sine = 400

# periods
samp_per = 1./samp_freq
sine_per = 1./f_sine

# compute time instances
t_vals = np.arange(0, sine_per*1, samp_per)
LOOKUP_SIZE = len(t_vals)
n_vals = np.arange(LOOKUP_SIZE)

# compute the sine table
MAX_SINE = 2**(data_type-1)-1   # [-(2*data_type-1), 2**(data_type-1)]
w_mod = 2*np.pi*(f_sine/samp_freq)
sine_table = np.sin(w_mod*n_vals) * MAX_SINE
if data_type == 16:
    sine_table = sine_table.astype(np.int16)
    print_type = np.uint16
    print_format = '0x%04x'
elif data_type == 32:
    sine_table = sine_table.astype(np.int32)
    print_type = np.uint32
    print_format = '0x%08x'
else:
    raise ValueError("Invalid data type!")

"""
Print C code.
"""
print('#define SINE_TABLE_SIZE', str(LOOKUP_SIZE))
print('#define SIN_MAX', (print_format % MAX_SINE))
if data_type == 16:
    print('const int16_t sine_table[SINE_TABLE_SIZE] = {')
elif data_type == 32:
    print('const int32_t sine_table[SINE_TABLE_SIZE] = {')
print(','.join([print_format % i.astype(print_type) for i in sine_table]))
print('};')

"""
Visualize
"""
plt.figure()
plt.stem(n_vals, sine_table)
plt.grid()
plt.autoscale(enable=True, axis='x', tight=True)
plt.xlabel("Index")
plt.title("Sine table for %d Hz at %d Hz sampling rate" % (f_sine, samp_freq))
plt.show()

Below is the plot visualizing the samples from the lookup table for a modulation frequency of 400 Hz and a sampling frequency of 32 kHz.

State variables

One advantage of our simple alien voice effect is that it does not require us to store past input or output values. In other words, the current output $y[n]$ only depends on the current input $x[n]$ .

Imagine that we have 128 frames per buffer and two channels: so the "buffer length" is half the size of the "buffer size". For the first buffer we receive, i.e. the first 128 samples, the voice effect computation is straightforward:

y[n] = x[n] \cdot \sin(\omega_{mod} \cdot [n \textrm{ \% LOOKUP_SIZE}]), \quad n \in [0, 127],

where \textrm{LOOKUP_SIZE} is the number of entries in our lookup table. However, the second and subsequent buffers require us to know the current time index. We cannot simply multiply the input signal with \sin(\omega_{mod} \cdot [n \textrm{ \% LOOKUP_SIZE}]), n \in [0, 127] at each buffer as this would result in multiplying our input signal with a discontinuous estimate of our sinusoid. In the figure below, we can observe how our input signal could be multiplied with a discontinuous estimate of a sinusoid if information is not passed between buffers. Such an operation would lead to glitches in the output audio, which have a very noticeable "clicking" sound.

Figure: Discontinuous sinusoid estimate (blue, right-side up triangles) across consecutive buffers. For the new buffer, the discontinuous estimate simply starts at the beginning of the lookup table rather than continuing along the lookup table (green, upside-down triangles).

One solution would be to keep track of the number of buffers processed thus far so that we could multiply with the appropriate time index as such:

y[128 \cdot B + n] = x[128 \cdot B + n] \cdot \sin(\omega_{mod} \cdot [(128 \cdot B + n) \textrm{ % LOOKUP_SIZE}]), \quad n \in [0, 127],

where $B$ is the number of buffers processed so far. Below is the corresponding pseudocode (do not copy this to Eclipse!):

B = 0
for k in range(n_buffers):
    for n in range(n_frames):
        y[n] = x[n] * sine_table[(n_frames*B+n)%LOOKUP_SIZE]
    B += 1

This works but the range of values $B$ could assume is technically unbounded if we never stop processing buffers! It would be more feasible to consider a value that has a limited range of possible values.

A better solution would be to keep track of our current "location", i.e. index, in the sinusoid lookup table. This way, in between buffers we know which is the last index in the lookup table we used so that we can use the appropriate offset in the processing of the new buffer. Below is the corresponding pseudocode for this approach (do not copy this to Eclipse!):

sine_pointer = 0
while(True):    # can go "forever" without worrying about the value of our state variable as it will wrap around!
    for n in range(n_frames):
        y[n] = x[n] * sine_table[sine_pointer]
        sine_pointer += 1
        sine_pointer %= LOOKUP_SIZE     # limited to the range [0, LOOKUP_SIZE-1]

Float vs. Int

Operations with float variables can take significantly more time than the same operations with int variables. For our application, we noticed that an implementation with float variables can take up to $35$ % more processing time! Therefore, we recommend avoiding float variable whenever possible!

When using integer values though it is not possible, for example, to code a lookup table that goes from $0$ to $1$ with $0.1$ increments or to use filter coefficients with values within $0$ and $1$ . Therefore, to maximize our precision and to minimize the computation cost, we will try to use the full range of our integer variables. For example $65'535$ in the case of unsigned int 16. This scaling factor will need to be incorporated whenever using the, e.g. lookup table or filter coefficients. With an intelligent use of operation priority (for example multiply before dividing in order to perform integer arithmetic without losing precision), it will not impact our precision and processing time.

Removing DC noise

(x[n] + n_{DC}) \cdot \sin(\omega_{mod} \cdot n) = y[n] + n_{DC} \cdot \sin(\omega_{mod} \cdot n),

where $y[n]$ are the samples of our desired alien voice effect and $n_{DC}$ is the level of the DC noise.

TASK 1: From your passthrough implementation, determine the value of the offset. Is it significant compared to the range of the microphone?

Hint: put a breakpoint in the process function; then with the debug tool, check the content of the input buffer.

To remove the DC noise, we could simply subtract the average offset of the microphone from every sample. This is a fast solution however, it assumes that the offset is always the same. This may be the case for a single microphone, but imagine calibrating hundreds, thousands, or even millions of microphones! This solution would certainly not scale well.

To avoid calibration we will implement a simple high pass filter. This filter will remove any DC component of the signal, i.e. bin number 0 of the Discrete Fourier Transform (DFT). We propose to use a cheap high pass filter of the following form:

y[n] = x[n] - x[n-1].

Y(z) = X(z) + X(z) \cdot z^{-1},

from which we can obtain the transfer function:

H(z) = \frac{Y(z)}{X(z)} = 1 - z^{-1}.

For certain system, we can also compute the Fourier Transform, which may be more intuitive to understand. For our simple high pass filter, the frequency response is shown below.

Figure: Frequency response of our simple high pass filter.

It is actually more common to plot the frequency response with the x-axis (frequency) in log scale, as shown below.

Figure: Frequency response of our simple high pass filter (log scale).

Now that we are subtracting the previous sample from the current sample, we will need to introduce another state variable for when we are at the beginning of the buffer. The resulting code is shown below:

// x_1 is the state variable containing the previous sample
for (uint16_t i = 0; i < FRAME_PER_BUFFER; i++) {
    y[i] = x[i] - x_1;
}

Benchmarking implementation

It is nice to say that we do real-time processing, but is it really the case?

To assess this important question we will add a tool to our system, which will measure how long it takes to process one buffer with our application. In this way, we will know if we are processing too slow according to the chosen $32$ kHz sampling frequency and the selected buffer length.

The HAL library includes a function uint32_t HAL_GetTick(void); which will return the number of ticks since the start of the microcontroller in milliseconds. Sadly, we cannot use this tool because the resolution of one millisecond is too large for our selected sampling frequency ( $1/32$ MHz $= 31.25 \mu\textrm{s}$ ).

Gain

One thing that you might have noticed from the passthrough example is that the output signal is not very loud. To correct this, we will add a small gain to the processfunction by just multiplying the signal with a constant. In order to take advantage of the architecture of the microcontroller's internal multiplier, it is recommended to use factors that are a multiple of 2 as it is faster to compute. In fact a multiplication by 2 is simply a shift to the left in the "binary world"; similar to how a multiplication by 10 in the "decimal world" is simply adding a 0 at the end.

Previous3.1 How it works Next3.3 Real-time with Python

Last updated 6 years ago

Was this helpful?

In this section, we will motivate several real-time DSP tips and tricks that arise when considering a practical implementation, using our simple alien voice effect as an example.

Lookup table

However, a quick glance at the Taylor series expansion of $\sin(x)$ reveals that this can be an expensive operation, involving several multiplications:

\sin x = x - \dfrac{x^3}{3!} + \dfrac{x^5}{5!} - \dfrac{x^7}{7!} + \mathcal{O}(x^9)

from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np

"""
Compute lookup table
"""
data_type = 16    # 16 or 32 signed integer
samp_freq = 32000
f_sine = 400

# periods
samp_per = 1./samp_freq
sine_per = 1./f_sine

# compute time instances
t_vals = np.arange(0, sine_per*1, samp_per)
LOOKUP_SIZE = len(t_vals)
n_vals = np.arange(LOOKUP_SIZE)

# compute the sine table
MAX_SINE = 2**(data_type-1)-1   # [-(2*data_type-1), 2**(data_type-1)]
w_mod = 2*np.pi*(f_sine/samp_freq)
sine_table = np.sin(w_mod*n_vals) * MAX_SINE
if data_type == 16:
    sine_table = sine_table.astype(np.int16)
    print_type = np.uint16
    print_format = '0x%04x'
elif data_type == 32:
    sine_table = sine_table.astype(np.int32)
    print_type = np.uint32
    print_format = '0x%08x'
else:
    raise ValueError("Invalid data type!")

"""
Print C code.
"""
print('#define SINE_TABLE_SIZE', str(LOOKUP_SIZE))
print('#define SIN_MAX', (print_format % MAX_SINE))
if data_type == 16:
    print('const int16_t sine_table[SINE_TABLE_SIZE] = {')
elif data_type == 32:
    print('const int32_t sine_table[SINE_TABLE_SIZE] = {')
print(','.join([print_format % i.astype(print_type) for i in sine_table]))
print('};')

"""
Visualize
"""
plt.figure()
plt.stem(n_vals, sine_table)
plt.grid()
plt.autoscale(enable=True, axis='x', tight=True)
plt.xlabel("Index")
plt.title("Sine table for %d Hz at %d Hz sampling rate" % (f_sine, samp_freq))
plt.show()

Below is the plot visualizing the samples from the lookup table for a modulation frequency of 400 Hz and a sampling frequency of 32 kHz.

State variables

However, as we explained in the , we receive (and process) the incoming audio in chunks (called "buffers"), which consist of multiple "frames". Moreover, each frame consists of one sample per channel. We will refer to the number of frames in each buffer as the "buffer length" and the total number of samples (as one frame could have multiple samples) as the "buffer size".

y[n] = x[n] \cdot \sin(\omega_{mod} \cdot [n \textrm{ \% LOOKUP_SIZE}]), \quad n \in [0, 127],

One solution would be to keep track of the number of buffers processed thus far so that we could multiply with the appropriate time index as such:

y[128 \cdot B + n] = x[128 \cdot B + n] \cdot \sin(\omega_{mod} \cdot [(128 \cdot B + n) \textrm{ % LOOKUP_SIZE}]), \quad n \in [0, 127],

where $B$ is the number of buffers processed so far. Below is the corresponding pseudocode (do not copy this to Eclipse!):

B = 0
for k in range(n_buffers):
    for n in range(n_frames):
        y[n] = x[n] * sine_table[(n_frames*B+n)%LOOKUP_SIZE]
    B += 1

sine_pointer = 0
while(True):    # can go "forever" without worrying about the value of our state variable as it will wrap around!
    for n in range(n_frames):
        y[n] = x[n] * sine_table[sine_pointer]
        sine_pointer += 1
        sine_pointer %= LOOKUP_SIZE     # limited to the range [0, LOOKUP_SIZE-1]

These values that we keep track of in between buffers, such as a pointer to the lookup table, are commonly referred to as state variables. For our applications in C, we recommend using for state variables inside the process function, as static variables keep their values between consecutive invocations inside a function.

Float vs. Int

More about this tradeoff can be read and .

Removing DC noise

Up until now, we assumed that the signal from the microphone is centered around zero, i.e. that no signal corresponds to an amplitude of zero. However, this is not always the case! During audio capture, the internal circuitry in the microphone may add an offset, and sometimes different microphones (of the same manufacturer) will have different offsets. We typically call this shift in the waveform a .

For our alien voice effect, a DC offset would result in a constant sinusoid (at our modulation frequency) present in the output signal. This is easy to see by adding the DC offset to the signal we saw :

(x[n] + n_{DC}) \cdot \sin(\omega_{mod} \cdot n) = y[n] + n_{DC} \cdot \sin(\omega_{mod} \cdot n),

where $y[n]$ are the samples of our desired alien voice effect and $n_{DC}$ is the level of the DC noise.

TASK 1: From your passthrough implementation, determine the value of the offset. Is it significant compared to the range of the microphone?

Hint: put a breakpoint in the process function; then with the debug tool, check the content of the input buffer.

y[n] = x[n] - x[n-1].

This type of filter is typically called a . Equations like above are often referred to as difference equations in DSP. In order to understand the "quality" of a filter, it is often useful to analyze the frequency response of a difference equation by taking its Z-transform:

Y(z) = X(z) + X(z) \cdot z^{-1},

from which we can obtain the transfer function:

H(z) = \frac{Y(z)}{X(z)} = 1 - z^{-1}.

From such an expression, we can create the standard as seen below. With such a plot, we can extract a lot of useful information, such as stability and causality.

Figure: Pole-zero plot of our simple high pass filter. Thanks to for the visualization function.

For certain system, we can also compute the Fourier Transform, which may be more intuitive to understand. For our simple high pass filter, the frequency response is shown below.

Figure: Frequency response of our simple high pass filter.

In addition to its simplicity, another good property of this filter is that it has , which means that each frequency will be delayed by the same amount of time.

It is actually more common to plot the frequency response with the x-axis (frequency) in log scale, as shown below.

Figure: Frequency response of our simple high pass filter (log scale).

With this perspective, we get a better idea of the . In this case we have a roll-off of 18 dB/decade, where a decade is 10x increase in frequency. In audio, it is sometime preferred to specify the roll-off in dB/octave, where an octave is 2x increase in frequency. Our simple high pass filter has a roll-off of 5.4 db/octave. See for a discussion on audio roll-off values.

From the frequency response in the figures above, we can observe how applying this high pass filter will significantly attenuate the DC offset. However, due to the simplicity of our chosen filter we will also attenuate frequencies in our range of interest; the human voice is roughly within the range of 300 Hz to 3400 Hz, see . A much sharper filter is certainly more desirable but for the purpose (and simplicity) of our exercise this filter will suffice.

// x_1 is the state variable containing the previous sample
for (uint16_t i = 0; i < FRAME_PER_BUFFER; i++) {
    y[i] = x[i] - x_1;
}

Benchmarking implementation

It is nice to say that we do real-time processing, but is it really the case?

In order to have a finer timebase, we will use a (a more in-depth explanation of the STM timer can be found ). It is an internal peripheral of the microcontroller and it can be used for a lot of applications (PWM generation, count of internal or external events, etc.).

Timers always have an input clock with one of the timebases of the microcontroller internal clocks (a quartz for example). This timebase can be either taken directly or reduced by a factor called a . It is important to chose an appropriate prescaler value as it will define how fast the timer counts. Other important parameters include the length of the timer's counting register and at what value it will be reset. For our application, we will use a timer with a large counting capacity (32 bits) and we will set it to increment itself every microsecond.