Frequency-domain processing
FFTs
- class tiliqua.fft.FFT(*args, src_loc_at=0, **kwargs)
Fixed-point Fast Fourier Transform.
Overview
This core computes the DFT of complex, fixed-point samples (specifically, power-of-two chunks, represented by
Block
ofCQ
). It can be run in ‘forward’ or ‘inverse’ mode, and is generally designed to match the behaviour ofscipy.fft(norm="forward")
.Input and output samples are clocked in FIFO order. Block processing only begins once
sz
elements (an entire block) has been clocked atself.i
. Some time later, the core clocks the transformed block outself.o
, before clocking in the next block. See the ‘Design’ section for further details on throughput.Switching modes
When the core is idle (nothing is clocked into the core and
i.ready
is strobed), theifft
signal may be set to 1 to put the core into ‘inverse FFT’ mode. Otherwise, it is in ‘forward FFT’ mode.In forward mode, this FFT core normalizes the outputs by 1/N, and the twiddle factors are not conjugated.
In ‘inverse FFT’ mode, there is no normalization, and the twiddle factors are conjugated as required.
Design
There are many tradeoffs an FFT implementation can make. This one aims for low area and DSP tile usage. It is an iterative implementation that only performs 2 multiplies at a time. The core FFT loop takes 6 cycles, where 2 of these are arithmetic and the rest are memory operations. The algorithm implemented here is ‘Cooley-Tukey, Decimation-in-Time, Radix-2’.
Latency, resource usage
As this core only performs 2 multiplies at a time, it only requires 2 DSP tiles, assuming the
shape
specified is small enough. For 16-bit samples at an FFT size of 1024, the extra bits needed to account for overflow requires 4x 18-bit multipliers on an ECP5.Outputs take
sz*log2(sz)*6
system clocks to be computed after all the inputs have been clocked in. That is, an FFT of size 1024 would take aboutlog2(1024)*6 == 60
system clocks per sample, for a maximum throughput around 1Msample/sec if we had a 60MHz master clock (in reality a bit less accounting for the input and output clocking times).This core instantiates 3 memories, which all scale with
sz
. A ROM for the twiddle factor (coefficient) memory, and 2 RAMs which are shared for input and output sample storage as well as intermediate calculations. An interesting addition to this core in the future may be to support external memory for huge FFT sizes.Internals
Note
Reading this is not necessary to use the core. I advise reading up a bit on how FFT algorithms work before diving deeper into this. It’s just a few notes to anyone interested in understanding this core more deeply.
This core has 3 main ‘phases’ in its overall state machine - an input phase (
LOAD1
andLOAD2
), calculation/loop phases (FFTLOOP
toWRITE-D
) and an output phase (READ-OUTPUT
).The loop phase has an inner counter
idx
, for the index of the current sample compared to the total block size, and an outer counterstage
which goes from 0 tolog2(sz)
. Oncestage
reacheslog2(sz)
, the calculations are complete. The input and output phases are simpler to understand, so we’ll focus on this loop phase.On every loop iteration, we read all needed values from memory and calculate the butterfly sum/difference outputs. Muxes are shown to represent the A / B ping-ponging between which memory they fetch from in even and odd stages (Stage 0
X->A, Y->B
, Stage 1X->B, Y->A
and so on up to Stagelog2(sz)
).┌─────────┐ ┌─────────┐ ┌─────────┐ │ X RAM │-\ /-│ Y RAM │ │ W ROM │ └────┬────┘ X └────┬────┘ └────┬────┘ ▼ / \ ▼ │ MUX<──── ────>MUX │ │ │ │ A B │ │ │ │ │ ┌─────┴─────┐ │ └────────>┤ Butterfly ├<────────┘ │ S=A+B×W │ │ D=A-B×W │ └─────┬─────┘ │S,D
Once the calculation is complete, the S, D values are written back in a similar ping-pong way to either the X or Y RAMs, depending on which ‘stage’ we are in. There are 6 states inside the FFT loop required for each iteration of this calculation. Some of them do arithmetic and memory accesses in the same state - in general the timing of each state is optimized to hit around 80MHz on a slowest speed-grade ECP5.
- Members:
i (
In(stream.Signature(Block(CQ(self.shape))))
) – Incoming stream of blocks of complex samples.o (
Out(stream.Signature(Block(CQ(self.shape))))
) – Outgoing stream of blocks of complex samples.ifft (
In(1, init=default_ifft)
) –0
for forward FFT with1/N
normalization.1
for inverse FFT. May only be changed when the core is idle (‘ready’ and not midway through a block!).
- __init__(shape=fixed.SQ(1, 15), sz=1024, default_ifft=False)
- shapefixed.SQ
Shape of the fixed-point types used for inputs and outputs. This is the shape of a single number, this core wraps it in
Block
(CQ
(shape)).- szint
Size of the FFT/IFFT blocks. Must be a power of 2.
- default_ifftbool
Default state of the
self.ifft
signal, can be used to create an IFFT instead of an FFT by default, without needing to explicitly connect the signal.
Windowing
- class tiliqua.fft.Window(*args, src_loc_at=0, **kwargs)
Pointwise window function.
Apply a real window function of size
sz
to blocks ofshape
. The window function is synchronized to the blockpayload.first
.Design
This core is iterative and uses a single multiplier. It does not store any samples, the windowed samples are available at the output 2 clocks after they are provided at the input.
- Members:
i (
In(stream.Signature(Block(self.shape)))
) – Incoming stream of blocks of real samples. at intervals matching thesz
of this component.o (
Out(stream.Signature(Block(self.shape)))
) – Outgoing stream of blocks of real samples. at intervals matching thesz
of this component.
- class Function(value, names=_not_given, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Enumeration representing different window functions for
Window
.- HANN(sz)
- SQRT_HANN(sz)
- RECT(sz)
- __init__(shape, sz, window_function=Function.SQRT_HANN)
- shapefixed.SQ
Shape of the fixed-point types used for inputs and outputs. This is the shape of a single number, this core wraps it in
Block
(shape).- szint
Size of the window function.
- window_functionFunction
Function used to calculate window function constants (type of window).
Overlap / Add
- class tiliqua.fft.ComputeOverlappingBlocks(*args, src_loc_at=0, **kwargs)
Real sample stream to (overlapping)
Block
stream conversion.This core is a building block for the
STFTProcessor
family of components. It takes a continuous (non-delineated) stream of samples, and sends delineated blocks of samples, which may be overlapping, for use by further processing.Here is a quick example to illustrate. From the input perspective, each sample would correspond to the following output samples (taking sz=8, n_overlap=4):
Input: 0 1 2 3 4 5 6 7 8 9 A B C D E F ... Block 1: 0 1 2 3 4 5 6 7 Block 2: 4 5 6 7 8 9 A B Block 3: 8 9 A B C D E F
Looking at this from the output perspective:
payload.sample: 0 1 2 3 4 5 6 7 4 5 6 7 8 9 A B 8 9 A B C D E F payload.first: 1 1 1
To avoid backpressure on the input stream, the incoming sample rate should be much slower than the system clock, and rate of processing the output blocks (which is easily achievable with audio signals).
Design
This core uses a single
SyncFIFO
for storage of sizen_overlap
, which is continuously filled and drained between blocks to create overlapping outputs.- Members:
i (
In(stream.Signature(self.shape))
) – Incoming (continuous) stream of real samples.o (
Out(stream.Signature(Block(self.shape)))
) – Outgoing stream of blocks of real samples. at intervals matching thesz
of this component.
- __init__(shape, sz, n_overlap)
- shapefixed.SQ
Shape of the fixed-point types used for inputs and outputs. This is the shape of a single number, this core wraps it in
Block
(shape).- szint
Size of the output blocks.
- n_overlapint
Number of elements shared between adjacent output blocks.
- class tiliqua.fft.OverlapAddBlocks(*args, src_loc_at=0, **kwargs)
Convert
Block
stream to continuous samples by ‘Overlap-Add’.This core is a building block for the
STFTProcessor
family of components. It takes overlapping, delineated blocks of samples, adds up the overlapping elements and sends a stream of (non-delineated) continuous samples.Again, an example for (sz=8, n_overlap=4). From the input perspective, say we have:
payload.sample: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J ... payload.first: 1 1 1
This core would send out the following stream:
Block 1: 0 1 2 3 4 5 6 7 Block 2: 8 9 A B C D E F Block 3: G H I J ... Out: 0 1 2 3 4 5 6 7 C D E J + + + + + + + + + + + + X X X X 8 9 A B G H I F ... (X == zero padding)
From above, it should be clear that each output sample is formed by summing the ‘overlapping’ parts of each input block, and that there are less output samples than input samples as a result.
Note that connecting a
ComputeOverlappingBlocks
straight to anOverlapAddBlocks
will not reconstruct the original signal, unless the samples are windowed (tapered) for perfect reconstruction. For an example, see theSTFTProcessor
family of components.Design
This core uses two
SyncFIFO
for storage of sizesz
, which are both continuously filled and drained as necessary to add up the correct overlapping parts.- Members:
i (
In(stream.Signature(Block(self.shape)))
) – Incoming stream of blocks of overlapping real samples.o (
Out(stream.Signature(self.shape))
) – Outgoing stream of continuous real samples.
- __init__(shape, sz, n_overlap)
- shapefixed.SQ
Shape of the fixed-point types used for inputs and outputs. This is the shape of a single number, this core wraps it in
Block
(shape).- szint
Size of the input blocks.
- n_overlapint
Number of elements shared between adjacent input blocks.
STFTs
- class tiliqua.fft.STFTProcessor(*args, src_loc_at=0, **kwargs)
Short-Time Fourier Transform (‘Overlap-Add’) with shared FFT core.
This core performs both analysis and re-synthesis of time-domain samples by passing them through the frequency domain for spectral processing by user logic.
The general flow of operations:
i
(samples in) ->ComputeOverlappingBlocks
->Window
->FFT
->o_freq
i_freq
->FFT
(inverse) ->Window
->OverlapAddBlocks
->o
(samples out)
If
o_freq
andi_freq
frequency-domain streams are directly connected together with no processing logic, this core will perfectly resynthesize the original signal.Window.Function.SQRT_HANN
windowing is used in both the analysis and synthesis stages.Design
Unlike
STFTProcessorPipelined
, this implementation saves area by time-multiplexingFFT
andWindow
cores between analysis and synthesis operations, at the cost of reduced throughput. For audio purposes, this core is easily still fast enough for real-time processing.At a high level, the components are connected in a different sequence depending on which state the core is in. The core contains a single SyncFIFO used to buffer all the outputs of user (frequency domain) processing logic. This is required to avoid backpressure on the user logic while it is disconnected from the IFFT stage.
- Members:
i (
In(stream.Signature(shape))
) – Continuous time-domain input streamo (
Out(stream.Signature(shape))
) – Continuous time-domain output stream (reconstructed)o_freq (
Out(stream.Signature(Block(CQ(shape))))
) – Frequency-domain output blocks for processing by user logici_freq (
In(stream.Signature(Block(CQ(shape))))
) – Frequency-domain input blocks after processing by user logic
- __init__(shape, sz)
- shapefixed.SQ
Shape of the fixed-point samples used for inputs and outputs.
- szint
Size of the frequency-domain blocks used internally and exposed for frequency domain processing.
o_freq
andi_freq
are delineated by blocks with apayload.first
strobe everysz
elements.
- class tiliqua.fft.STFTAnalyzer(*args, src_loc_at=0, **kwargs)
Short-Time Fourier Transform for spectral analysis.
This core allows frequency-domain analysis of time-domain samples by passing them through the following signal flow:
i
(samples in) ->ComputeOverlappingBlocks
->Window
->FFT
->o
For a more resource-efficient STFT that performs both analysis and synthesis, see
STFTProcessor
.- Members:
i (
In(stream.Signature(shape))
) – Continuous time-domain input streamo (
Out(stream.Signature(Block(CQ(shape))))
) – Frequency-domain output blocks for processing by user logic
- __init__(shape, sz, window_function=Window.Function.HANN)
- shapefixed.SQ
Shape of the fixed-point samples used for inputs and outputs.
- szint
Size of the frequency-domain blocks used internally and exposed for frequency domain processing.
- windowWindow.Function
Window function applied to time-domain blocks the FFT.
- class tiliqua.fft.STFTSynthesizer(*args, src_loc_at=0, **kwargs)
Short-Time Fourier Transform for spectral synthesis.
This core allows frequency-domain synthesis of time-domain samples by passing them through the following signal flow:
i_freq
->FFT
(inverse) ->Window
->OverlapAddBlocks
->o
(samples out)
For a more resource-efficient STFT that performs both analysis and synthesis, see
STFTProcessor
.- Members:
i (
In(stream.Signature(Block(CQ(shape))))
) – Frequency-domain input blocks used for synthesis.o (
Out(stream.Signature(shape))
) – Continuous time-domain output stream
- __init__(shape, sz, window_function=Window.Function.HANN)
- shapefixed.SQ
Shape of the fixed-point samples used for inputs and outputs.
- szint
Size of the frequency-domain blocks used internally and exposed for frequency domain processing.
- windowWindow.Function
Window function applied to time-domain blocks after the IFFT, but before overlap-add.
- class tiliqua.fft.STFTProcessorPipelined(*args, src_loc_at=0, **kwargs)
Short-Time Fourier Transform (‘Overlap-Add’) with separate FFT/IFFT cores.
This core performs both analysis and re-synthesis of time-domain samples by passing them through the frequency domain for spectral processing by user logic.
This is a simpler-to-understand version of
STFTProcessor
as it does not share theWindow
orFFT
cores, making it higher throughput, but also higher resource usage, because it can perform the FFT and IFFT at the same time.This core is mostly used for testing the
STFTAnalyzer
andSTFTSynthesizer
components in isolation, and doesn’t make much sense to use in real projects compared to just using theSTFTProcessor
above, which is much more resource efficient for audio applications. Unless you have super high sample rates.- Members:
i (
In(stream.Signature(shape))
) – Continuous time-domain input streamo (
Out(stream.Signature(shape))
) – Continuous time-domain output stream (reconstructed)o_freq (
Out(stream.Signature(Block(CQ(shape))))
) – Frequency-domain output blocks for processing by user logici_freq (
In(stream.Signature(Block(CQ(shape))))
) – Frequency-domain input blocks after processing by user logic
- __init__(shape, sz)
- shapefixed.SQ
Shape of the fixed-point samples used for inputs and outputs.
- szint
Size of the frequency-domain blocks used internally and exposed for frequency domain processing.
Spectral utilities
Spectral processing components.
- class tiliqua.spectral.BlockLPF(*args, src_loc_at=0, **kwargs)
Block-based (point-wise) one-pole low-pass filter array.
This is an array (size sz) of one-pole low-pass filters with a single adjustable smoothing constant beta. That is, an independent filter is tracked for every element in each block. The
Block
payload.first
flags are used to track which filter memory should be addressed for each sample, without requiring separate streams for each channel.For each element, we compute:
self.y[n] = self.y[n]*self.beta + self.x[n]*(1-self.beta)
This is useful for morphing between blocks of real spectral envelopes, but could also be used for other purposes.
- Members:
i (
In(stream.Signature(Block(self.shape)))
) – Incoming stream of blocks of real samples.o (
Out(stream.Signature(Block(self.shape)))
) – Outgoing stream of blocks of real samples.
- __init__(shape, sz, macp=None)
- shapeShape
Shape of fixed-point number to use for block streams.
- szint
Number of independent filters, must exactly match the size of each block.
- macpbool
A
mac.MAC
provider, for multiplies.
- class tiliqua.spectral.SpectralEnvelope(*args, src_loc_at=0, **kwargs)
Compute a smoothed, real spectral envelope.
Given a block of complex frequency-domain spectra, extract the magnitude from each point and filter each magnitude in the block independently with a one-pole smoother, emitting a corresponding block representing the evolving (smoothed) spectral envelope.
The rect-to-polar CORDIC is run without magnitude correction, which saves another multiplier at the cost of everything being multiplied by a constant factor (may or may not matter depending on the use).
- Members:
i (
In(stream.Signature(Block(CQ(self.shape))))
) – Incoming stream of blocks of complex spectra.o (
Out(stream.Signature(Block(self.shape)))
) – Outgoing stream of blocks of real (smoothed) magnitude spectra.
- __init__(shape, sz)
- shapeShape
Shape of fixed-point number to use for streams.
- szint
The size of each input block and outgoing spectral envelope blocks.
- class tiliqua.spectral.SpectralCrossSynthesis(*args, src_loc_at=0, **kwargs)
Apply the spectral envelope of ‘modulator’ on a ‘carrier’.
Consume 2 sets of frequency-domain spectra (blocks) representing a ‘carrier’ and ‘modulator’. The (real) envelope of the ‘modulator’ spectra is multiplied by the (complex) ‘carrier’ spectra, creating a classic vocoder effect where the timbre of the ‘carrier’ dominates, but is filtered spectrally by the ‘modulator’
This core computes the spectral envelope of the modulator by filtering the magnitude of each frequency band, see
SpectralEnvelope
for details. Put simply, this core computes:out.real = carrier.real * modulator.envelope_magnitude out.imag = carrier.imag * modulator.envelope_magnitude
- Members:
i_carrier (
In(stream.Signature(Block(CQ(self.shape))))
) – Incoming stream of blocks of complex spectra of the ‘carrier’.i_modulator (
In(stream.Signature(Block(CQ(self.shape))))
) – Incoming stream of blocks of complex spectra of the ‘modulator’.o (
Out(stream.Signature(Block(CQ(self.shape))))
) – Outgoing stream of cross-synthesized ‘carrier’ and ‘modulator’.
- __init__(shape, sz)
- shapeShape
Shape of fixed-point number to use for block streams.
- szint
Size of each block of complex spectra.