7.2 Python implementation
In the IPython notebook (Section 4), we already have code that implements pitch shifting:
def DFT_rescale(x, f):
"""
Utility function that pitch shift a short segment `x`.
"""
X = np.fft.fft(x)
# separate even and odd lengths
parity = (len(X) % 2 == 0)
N = len(X) / 2 + 1 if parity else (len(X) + 1) / 2
Y = np.zeros(N, dtype=np.complex)
# work only in the first half of the DFT vector since input is real
for n in xrange(0, N):
# accumulate original frequency bins into rescaled bins
ix = int(n * f)
if ix < N:
Y[ix] += X[n]
# now rebuild a Hermitian-symmetric DFT
Y = np.r_[Y, np.conj(Y[-2:0:-1])] if parity else np.r_[Y, np.conj(Y[-1:0:-1])]
return np.real(np.fft.ifft(Y))
def DFT_pshift(x, f, G, overlap=0):
"""
Function for pitch shifting an input signal by applying the above utility function on overlapping segments.
"""
N = len(x)
y = np.zeros(N)
win, stride = win_taper(G, overlap)
for n in xrange(0, len(x) - G, stride):
w = DFT_rescale(x[n:n+G] * win, f)
y[n:n+G] += w * win
return y
Unlike the granular synthesis chapter, the above function DFT_pshift
is more suitable for a real-time implementation as the input x[n:n+G]
and output y[n:n+G]
buffers have the same length. Recall that this was not the case for the granular synthesis implementation in the IPython notebook (cell 70 under Section 3).
However, in order to make the task of porting to C much easier, we would like to implement the above effect without using array operations and using some of the tips and tricks we saw in the alien voice chapter, namely:
Lookup tables.
State variables.
Integer data types.
There exist C libraries for performing the FFT (e.g. FFTW) so we will still use numpy
's FFT library in our Python implementation.
def build_dft_rescale_lookup(n_bins, shift_factor):
"""
Build lookup table from DFT bins to rescaled bins.
:param n_bins: Number of bins in positive half of DFT.
:param shift_factor: Shift factor for voice effect.
:return shift_idx: Mapping from bin to rescaled bin.
:return max_bin: Maximum bin until rescaled is less than `n_bins`.
"""
shift_idx = np.zeros(n_bins, dtype=np.int16)
max_bin = n_bins
for k in range(n_bins):
"""
TODO
- compute the mapping from a bin `k` to the rescaled bin `ix`
- if: smaller than `n_bins` store in lookup table
- else: update value of `max_bin` and break from loop
"""
shift_idx[k] = k
return shift_idx, max_bin
def dft_rescale(x, n_bins, shift_idx, max_bin):
"""
Rescale spectrum using the lookup table.
:param x: Input segment in time domain
:param n_bins: Number of bins in positive half of DFT.
:param shift_idx: Mapping from bin to rescaled bin.
:param max_bin: Maximum bin until rescaled is less than `n_bins`.
:return: Pitch-shifted audio segment in time domain.
"""
X = np.fft.rfft(x)
Y = np.zeros(n_bins, dtype=np.complex)
for k in range(max_bin):
# TODO: use lookup table to shift original bins.
Y[k] += X[k]
# take inverse RFFT
return np.fft.irfft(Y)
Notice that we use the functions np.fft.rfft
and np.fft.irfft
so that we only work with the positive half of the spectrum, as we have a real-valued input signal.
With our DFT rescaling functions now complete, we can procede to applying the effect on a fixed WAV file.
At the top of the script, we have a new code snippet:
is_even = (GRAIN_LEN_SAMP % 2 == 0)
if is_even:
NFFT = GRAIN_LEN_SAMP // 2 + 1
else:
NFFT = (GRAIN_LEN_SAMP + 1) // 2
This code defines a new global variable N_BINS
for the number of frequency bins in the positive half of the spectrum after performing a DFT of length GRAIN_LEN_SAMP
(grain length in samples).
Now for the fun part! That is, implementing the pitch shifting and trying it out.
If we look at the following code snippet from the function DFT_pshift
of the IPython notebook:
w = DFT_rescale(x[n:n+G] * win, f)
y[n:n+G] += w * win
We can observe three steps:
Apply the tapering window to the input grain:
x[n:n+G] * win
.Apply the DFT rescaling function.
Apply the tapering window to the DFT-rescaled grain:
w * win
.
The last step is already performed as part of our granular synthesis code, so we have to implement the first two steps. However, we will also have additional steps of casting our input grain to floating point, as the FFT works with this variable type, and casting back to integer.
Apply tapering window to the (integer) input grain.
Cast input grain to floating point.
Apply the DFT rescaling function.
Cast rescaled grain to integer type.
You may notice that in the provided script there is a new global array called input_concat_float
, which you are suggested to use.
With this additional code, your DFT-based pitch shifter should be complete!
Real-time implementation
Congrats on implementing the DFT-based pitch shifter! You may notice some unwanted artifacts when applying this effect. More advanced versions (such as commercial "auto-tune" applications) take great care to minimize such artifacts by doing a more sophisticated analysis of each speech segment. We saw a similar type of analysis when computing the LPC coefficients in the previous chapter.
The code here is essentially ready for porting to C. However, we still need to introduce you to a library for computing the DFT. This will be covered in a (yet to come) section.
For now, enjoy your various voice effects!
Last updated
Was this helpful?