Frequency-domain processing

FFTs

class tiliqua.fft.FFT(*args, src_loc_at=0, **kwargs)

Fixed-point Fast Fourier Transform.

Overview

This core computes the DFT of complex, fixed-point samples (specifically, power-of-two chunks, represented by Block of CQ). It can be run in ‘forward’ or ‘inverse’ mode, and is generally designed to match the behaviour of scipy.fft(norm="forward").

Input and output samples are clocked in FIFO order. Block processing only begins once sz elements (an entire block) has been clocked at self.i. Some time later, the core clocks the transformed block out self.o, before clocking in the next block. See the ‘Design’ section for further details on throughput.

Switching modes

When the core is idle (nothing is clocked into the core and i.ready is strobed), the ifft signal may be set to 1 to put the core into ‘inverse FFT’ mode. Otherwise, it is in ‘forward FFT’ mode.

In forward mode, this FFT core normalizes the outputs by 1/N, and the twiddle factors are not conjugated.

In ‘inverse FFT’ mode, there is no normalization, and the twiddle factors are conjugated as required.

Design

There are many tradeoffs an FFT implementation can make. This one aims for low area and DSP tile usage. It is an iterative implementation that only performs 2 multiplies at a time. The core FFT loop takes 6 cycles, where 2 of these are arithmetic and the rest are memory operations. The algorithm implemented here is ‘Cooley-Tukey, Decimation-in-Time, Radix-2’.

Latency, resource usage

As this core only performs 2 multiplies at a time, it only requires 2 DSP tiles, assuming the shape specified is small enough. For 16-bit samples at an FFT size of 1024, the extra bits needed to account for overflow requires 4x 18-bit multipliers on an ECP5.

Outputs take sz*log2(sz)*6 system clocks to be computed after all the inputs have been clocked in. That is, an FFT of size 1024 would take about log2(1024)*6 == 60 system clocks per sample, for a maximum throughput around 1Msample/sec if we had a 60MHz master clock (in reality a bit less accounting for the input and output clocking times).

This core instantiates 3 memories, which all scale with sz. A ROM for the twiddle factor (coefficient) memory, and 2 RAMs which are shared for input and output sample storage as well as intermediate calculations. An interesting addition to this core in the future may be to support external memory for huge FFT sizes.

Internals

Note

Reading this is not necessary to use the core. I advise reading up a bit on how FFT algorithms work before diving deeper into this. It’s just a few notes to anyone interested in understanding this core more deeply.

This core has 3 main ‘phases’ in its overall state machine - an input phase (LOAD1 and LOAD2), calculation/loop phases (FFTLOOP to WRITE-D) and an output phase (READ-OUTPUT).

The loop phase has an inner counter idx, for the index of the current sample compared to the total block size, and an outer counter stage which goes from 0 to log2(sz). Once stage reaches log2(sz), the calculations are complete. The input and output phases are simpler to understand, so we’ll focus on this loop phase.

On every loop iteration, we read all needed values from memory and calculate the butterfly sum/difference outputs. Muxes are shown to represent the A / B ping-ponging between which memory they fetch from in even and odd stages (Stage 0 X->A, Y->B, Stage 1 X->B, Y->A and so on up to Stage log2(sz)).

┌─────────┐     ┌─────────┐     ┌─────────┐
│  X RAM  │-\ /-│  Y RAM  │     │  W ROM  │
└────┬────┘  X  └────┬────┘     └────┬────┘
     ▼      / \      ▼               │
    MUX<────   ────>MUX              │
     │               │               │
     A               B               │
     │               │               │
     │         ┌─────┴─────┐         │
     └────────>┤ Butterfly ├<────────┘
               │  S=A+B×W  │
               │  D=A-B×W  │
               └─────┬─────┘
                     │S,D

Once the calculation is complete, the S, D values are written back in a similar ping-pong way to either the X or Y RAMs, depending on which ‘stage’ we are in. There are 6 states inside the FFT loop required for each iteration of this calculation. Some of them do arithmetic and memory accesses in the same state - in general the timing of each state is optimized to hit around 80MHz on a slowest speed-grade ECP5.

Members:

i (In(stream.Signature(Block(CQ(self.shape))))) – Incoming stream of blocks of complex samples.
o (Out(stream.Signature(Block(CQ(self.shape))))) – Outgoing stream of blocks of complex samples.
ifft (In(1, init=default_ifft)) – 0 for forward FFT with 1/N normalization. 1 for inverse FFT. May only be changed when the core is idle (‘ready’ and not midway through a block!).

__init__(shape=fixed.SQ(1, 15), sz=1024, default_ifft=False)

shapefixed.SQ: Shape of the fixed-point types used for inputs and outputs. This is the shape of a single number, this core wraps it in Block(CQ(shape)).
szint: Size of the FFT/IFFT blocks. Must be a power of 2.
default_ifftbool: Default state of the self.ifft signal, can be used to create an IFFT instead of an FFT by default, without needing to explicitly connect the signal.

Windowing

class tiliqua.fft.Window(*args, src_loc_at=0, **kwargs)

Pointwise window function.

Apply a real window function of size sz to blocks of shape. The window function is synchronized to the block payload.first.

Design

This core is iterative and uses a single multiplier. It does not store any samples, the windowed samples are available at the output 2 clocks after they are provided at the input.

Members:

i (In(stream.Signature(Block(self.shape)))) – Incoming stream of blocks of real samples. at intervals matching the sz of this component.
o (Out(stream.Signature(Block(self.shape)))) – Outgoing stream of blocks of real samples. at intervals matching the sz of this component.

class Function(value, names=_not_given, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Enumeration representing different window functions for Window.

HANN(sz)

SQRT_HANN(sz)

RECT(sz)

__init__(shape, sz, window_function=Function.SQRT_HANN)

shapefixed.SQ: Shape of the fixed-point types used for inputs and outputs. This is the shape of a single number, this core wraps it in Block(shape).
szint: Size of the window function.
window_functionFunction: Function used to calculate window function constants (type of window).

Overlap / Add

class tiliqua.fft.ComputeOverlappingBlocks(*args, src_loc_at=0, **kwargs)

Real sample stream to (overlapping) Block stream conversion.

This core is a building block for the STFTProcessor family of components. It takes a continuous (non-delineated) stream of samples, and sends delineated blocks of samples, which may be overlapping, for use by further processing.

Here is a quick example to illustrate. From the input perspective, each sample would correspond to the following output samples (taking sz=8, n_overlap=4):

Input:    0 1 2 3 4 5 6 7 8 9 A B C D E F ...
Block 1:  0 1 2 3 4 5 6 7
Block 2:          4 5 6 7 8 9 A B
Block 3:                  8 9 A B C D E F

Looking at this from the output perspective:

payload.sample:  0 1 2 3 4 5 6 7 4 5 6 7 8 9 A B 8 9 A B C D E F
payload.first:   1               1               1

To avoid backpressure on the input stream, the incoming sample rate should be much slower than the system clock, and rate of processing the output blocks (which is easily achievable with audio signals).

Design

This core uses a single SyncFIFO for storage of size n_overlap, which is continuously filled and drained between blocks to create overlapping outputs.

Members:

i (In(stream.Signature(self.shape))) – Incoming (continuous) stream of real samples.
o (Out(stream.Signature(Block(self.shape)))) – Outgoing stream of blocks of real samples. at intervals matching the sz of this component.

__init__(shape, sz, n_overlap)

shapefixed.SQ: Shape of the fixed-point types used for inputs and outputs. This is the shape of a single number, this core wraps it in Block(shape).
szint: Size of the output blocks.
n_overlapint: Number of elements shared between adjacent output blocks.

class tiliqua.fft.OverlapAddBlocks(*args, src_loc_at=0, **kwargs)

Convert Block stream to continuous samples by ‘Overlap-Add’.

This core is a building block for the STFTProcessor family of components. It takes overlapping, delineated blocks of samples, adds up the overlapping elements and sends a stream of (non-delineated) continuous samples.

Again, an example for (sz=8, n_overlap=4). From the input perspective, say we have:

payload.sample:  0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J ...
payload.first:   1               1               1

This core would send out the following stream:

Block 1:  0 1 2 3 4 5 6 7
Block 2:          8 9 A B C D E F
Block 3:                  G H I J ...

Out:      0 1 2 3 4 5 6 7 C D E J
          + + + + + + + + + + + +
          X X X X 8 9 A B G H I F ... (X == zero padding)

From above, it should be clear that each output sample is formed by summing the ‘overlapping’ parts of each input block, and that there are less output samples than input samples as a result.

Note that connecting a ComputeOverlappingBlocks straight to an OverlapAddBlocks will not reconstruct the original signal, unless the samples are windowed (tapered) for perfect reconstruction. For an example, see the STFTProcessor family of components.

Design

This core uses two SyncFIFO for storage of size sz, which are both continuously filled and drained as necessary to add up the correct overlapping parts.

Members:

i (In(stream.Signature(Block(self.shape)))) – Incoming stream of blocks of overlapping real samples.
o (Out(stream.Signature(self.shape))) – Outgoing stream of continuous real samples.

__init__(shape, sz, n_overlap)

shapefixed.SQ: Shape of the fixed-point types used for inputs and outputs. This is the shape of a single number, this core wraps it in Block(shape).
szint: Size of the input blocks.
n_overlapint: Number of elements shared between adjacent input blocks.

STFTs

class tiliqua.fft.STFTProcessor(*args, src_loc_at=0, **kwargs)

Short-Time Fourier Transform (‘Overlap-Add’) with shared FFT core.

This core performs both analysis and re-synthesis of time-domain samples by passing them through the frequency domain for spectral processing by user logic.

The general flow of operations:

i (samples in) -> ComputeOverlappingBlocks -> Window -> FFT -> o_freq

i_freq -> FFT (inverse) -> Window -> OverlapAddBlocks -> o (samples out)

If o_freq and i_freq frequency-domain streams are directly connected together with no processing logic, this core will perfectly resynthesize the original signal.

Window.Function.SQRT_HANN windowing is used in both the analysis and synthesis stages.

Design

Unlike STFTProcessorPipelined, this implementation saves area by time-multiplexing FFT and Window cores between analysis and synthesis operations, at the cost of reduced throughput. For audio purposes, this core is easily still fast enough for real-time processing.

At a high level, the components are connected in a different sequence depending on which state the core is in. The core contains a single SyncFIFO used to buffer all the outputs of user (frequency domain) processing logic. This is required to avoid backpressure on the user logic while it is disconnected from the IFFT stage.

Members:

i (In(stream.Signature(shape))) – Continuous time-domain input stream
o (Out(stream.Signature(shape))) – Continuous time-domain output stream (reconstructed)
o_freq (Out(stream.Signature(Block(CQ(shape))))) – Frequency-domain output blocks for processing by user logic
i_freq (In(stream.Signature(Block(CQ(shape))))) – Frequency-domain input blocks after processing by user logic

__init__(shape, sz)

shapefixed.SQ: Shape of the fixed-point samples used for inputs and outputs.
szint: Size of the frequency-domain blocks used internally and exposed for frequency domain processing. o_freq and i_freq are delineated by blocks with a payload.first strobe every sz elements.

class tiliqua.fft.STFTAnalyzer(*args, src_loc_at=0, **kwargs)

Short-Time Fourier Transform for spectral analysis.

This core allows frequency-domain analysis of time-domain samples by passing them through the following signal flow:

i (samples in) -> ComputeOverlappingBlocks -> Window -> FFT -> o

For a more resource-efficient STFT that performs both analysis and synthesis, see STFTProcessor.

Members:

i (In(stream.Signature(shape))) – Continuous time-domain input stream
o (Out(stream.Signature(Block(CQ(shape))))) – Frequency-domain output blocks for processing by user logic

__init__(shape, sz, window_function=Window.Function.HANN)

shapefixed.SQ: Shape of the fixed-point samples used for inputs and outputs.
szint: Size of the frequency-domain blocks used internally and exposed for frequency domain processing.
windowWindow.Function: Window function applied to time-domain blocks the FFT.

class tiliqua.fft.STFTSynthesizer(*args, src_loc_at=0, **kwargs)

Short-Time Fourier Transform for spectral synthesis.

This core allows frequency-domain synthesis of time-domain samples by passing them through the following signal flow:

i_freq -> FFT (inverse) -> Window -> OverlapAddBlocks -> o (samples out)

For a more resource-efficient STFT that performs both analysis and synthesis, see STFTProcessor.

Members:

i (In(stream.Signature(Block(CQ(shape))))) – Frequency-domain input blocks used for synthesis.
o (Out(stream.Signature(shape))) – Continuous time-domain output stream

__init__(shape, sz, window_function=Window.Function.HANN)

shapefixed.SQ: Shape of the fixed-point samples used for inputs and outputs.
szint: Size of the frequency-domain blocks used internally and exposed for frequency domain processing.
windowWindow.Function: Window function applied to time-domain blocks after the IFFT, but before overlap-add.

class tiliqua.fft.STFTProcessorPipelined(*args, src_loc_at=0, **kwargs)

Short-Time Fourier Transform (‘Overlap-Add’) with separate FFT/IFFT cores.

This core performs both analysis and re-synthesis of time-domain samples by passing them through the frequency domain for spectral processing by user logic.

This is a simpler-to-understand version of STFTProcessor as it does not share the Window or FFT cores, making it higher throughput, but also higher resource usage, because it can perform the FFT and IFFT at the same time.

This core is mostly used for testing the STFTAnalyzer and STFTSynthesizer components in isolation, and doesn’t make much sense to use in real projects compared to just using the STFTProcessor above, which is much more resource efficient for audio applications. Unless you have super high sample rates.

Members:

i (In(stream.Signature(shape))) – Continuous time-domain input stream
o (Out(stream.Signature(shape))) – Continuous time-domain output stream (reconstructed)
o_freq (Out(stream.Signature(Block(CQ(shape))))) – Frequency-domain output blocks for processing by user logic
i_freq (In(stream.Signature(Block(CQ(shape))))) – Frequency-domain input blocks after processing by user logic

__init__(shape, sz)

shapefixed.SQ: Shape of the fixed-point samples used for inputs and outputs.
szint: Size of the frequency-domain blocks used internally and exposed for frequency domain processing.

Spectral utilities

Spectral processing components.

class tiliqua.spectral.BlockLPF(*args, src_loc_at=0, **kwargs)

Block-based (point-wise) one-pole low-pass filter array.

This is an array (size sz) of one-pole low-pass filters with a single adjustable smoothing constant beta. That is, an independent filter is tracked for every element in each block. The Block payload.first flags are used to track which filter memory should be addressed for each sample, without requiring separate streams for each channel.

For each element, we compute:

self.y[n] = self.y[n]*self.beta + self.x[n]*(1-self.beta)

This is useful for morphing between blocks of real spectral envelopes, but could also be used for other purposes.

Members:

i (In(stream.Signature(Block(self.shape)))) – Incoming stream of blocks of real samples.
o (Out(stream.Signature(Block(self.shape)))) – Outgoing stream of blocks of real samples.

__init__(shape, sz, macp=None)

shapeShape: Shape of fixed-point number to use for block streams.
szint: Number of independent filters, must exactly match the size of each block.
macpbool: A mac.MAC provider, for multiplies.

class tiliqua.spectral.SpectralEnvelope(*args, src_loc_at=0, **kwargs)

Compute a smoothed, real spectral envelope.

Given a block of complex frequency-domain spectra, extract the magnitude from each point and filter each magnitude in the block independently with a one-pole smoother, emitting a corresponding block representing the evolving (smoothed) spectral envelope.

The rect-to-polar CORDIC is run without magnitude correction, which saves another multiplier at the cost of everything being multiplied by a constant factor (may or may not matter depending on the use).

Members:

i (In(stream.Signature(Block(CQ(self.shape))))) – Incoming stream of blocks of complex spectra.
o (Out(stream.Signature(Block(self.shape)))) – Outgoing stream of blocks of real (smoothed) magnitude spectra.

__init__(shape, sz)

shapeShape: Shape of fixed-point number to use for streams.
szint: The size of each input block and outgoing spectral envelope blocks.

class tiliqua.spectral.SpectralCrossSynthesis(*args, src_loc_at=0, **kwargs)

Apply the spectral envelope of ‘modulator’ on a ‘carrier’.

Consume 2 sets of frequency-domain spectra (blocks) representing a ‘carrier’ and ‘modulator’. The (real) envelope of the ‘modulator’ spectra is multiplied by the (complex) ‘carrier’ spectra, creating a classic vocoder effect where the timbre of the ‘carrier’ dominates, but is filtered spectrally by the ‘modulator’

This core computes the spectral envelope of the modulator by filtering the magnitude of each frequency band, see SpectralEnvelope for details. Put simply, this core computes:

out.real = carrier.real * modulator.envelope_magnitude
out.imag = carrier.imag * modulator.envelope_magnitude

Members:

i_carrier (In(stream.Signature(Block(CQ(self.shape))))) – Incoming stream of blocks of complex spectra of the ‘carrier’.
i_modulator (In(stream.Signature(Block(CQ(self.shape))))) – Incoming stream of blocks of complex spectra of the ‘modulator’.
o (Out(stream.Signature(Block(CQ(self.shape))))) – Outgoing stream of cross-synthesized ‘carrier’ and ‘modulator’.

__init__(shape, sz)

shapeShape: Shape of fixed-point number to use for block streams.
szint: Size of each block of complex spectra.