Multiplier sharing (dsp.mac)

dsp.mac: DSP tile sharing

For audio rate signals, where sample rates are low and the desired amount of separate functional blocks is high, sharing DSP tiles is essential. Without sharing DSP tiles, multipliers are often the first FPGA resource to be exhausted.

This file provides mechanisms for sharing DSP tiles (multipliers) amongst multiple components using 2 different strategies:

  1. MuxMAC: One DSP tile is time multiplexed. Latency relatively low, however sharing >3x MACs quickly blows up resource usage.

  2. RingMAC: Message ring sharing. Multiple components are connected in a message ring (essentially a large circular shift register). On each ring, there is a single DSP tile processing multiplies. DSP tile throughput of near 100% is still achievable, however latency is higher.

class tiliqua.dsp.mac.MAC(*args, src_loc_at=0, **kwargs)

Base class for MAC strategies. Subclasses provide the concrete strategy.

Users of this component perform multiplications using MAC.Multiply(m, ...), which may have different latency depending on the concrete strategy.

Multiply(m, **operands)

Compute z = a*b, returning a context object which is active in the same clock that the answer is available on self.result.z.

Ensure operands do NOT change until the operation completes.

For example:

s_a = fixed.Const(0.5, shape=mac.SQNative)
s_b = fixed.Const(0.25, shape=mac.SQNative)
s_z = Signal(ASQ)

with m.FSM() as fsm:
    # ... some states ...
    with m.State('MAC'):
        # Set up multiplication
        # Read as: ``m.If(result_available)``
        with mp.Multiply(m, a=s_a, b=s_b):
            m.d.sync += s_z.eq(mp.result.z)
            m.next = 'DONE'
    # ... some more states ...
default()

Default MAC provider for DSP components if None is specified.

class tiliqua.dsp.mac.MuxMAC(*args, src_loc_at=0, **kwargs)

A Multiplexing multiplication provider.

Instantiates a single multiplier, shared between users of this MuxMAC using time division multiplexing, as follows:

a₁*b₁ ───►│                ├───► result₁
a₂*b₂ ───►├──►[DSP Tile]──►┼───► result₂
a₃*b₃ ───►│                ├───► result₃
         mux             demux

When sharing amongst lots of cores, the required multiplexer size can quickly become unusably large.

class tiliqua.dsp.mac.RingMAC(*args, src_loc_at=0, **kwargs)

A message-ring-backed multiplication provider, where DSP tiles are shared between many components.

The common pattern here is that each functional block tends to use a single RingMAC, even if it has multiple MAC steps. That is, the RingMAC itself is Mux’d within a core, however all requests land on the same shared bus which is a message ring connecting different cores. This keeps multiplexers local to each core:

                  ┌─────────────────────────────┐
                  │                             │
┌─────────────────┼─────────────────────┐       │
│                 │                     │       │
│a₁b₁ ───►│       │        ├───► result₁│       │
│a₂b₂ ───►├──[RingClient]─►┼───► result₂│       │
│a₃b₃ ───►│       ▲        ├───► result₃│       │
│        mux      │      demux          │       │
└─────────────────┼─────────────────────┘       │
Component1        │                             │
                  │                             │
┌─────────────────┼─────────────────────┐       ▼
│                 │                     │   ┌──────────┐
│a₁b₁ ───►│       │        ├───► result₁│   │          │
│a₂b₂ ───►├──[RingClient]─►┼───► result₂│   │[DSP Tile]│
│a₃b₃ ───►│       ▲        ├───► result₃│   │          │
│        mux      │      demux          │   └──────────┘
└─────────────────┼─────────────────────┘   RingMACServer
Component2        │                             │
                  │                             │
┌─────────────────┼─────────────────────┐       │
│                 │                     │       │
│a₁b₁ ───►│       │        ├───► result₁│       │
│a₂b₂ ───►├──[RingClient]─►┼───► result₂│       │
│a₃b₃ ───►│       ▲        ├───► result₃│       │
│        mux      │      demux          │       │
└─────────────────┼─────────────────────┘       │
Component3        │                             │
                  └─────────────────────────────┘

This provides near-optimal scheduling for message rings composed of components that have the same state machines.

Normally these should only be created from an existing server using RingMACServer.new_client(). This automatically hooks up the ring and tag attributes, but does NOT add it as a submodule for elaboration (you must do this).

Contains no multiplier, ring must be hooked up to a message ring on which a RingMACServer can be found. tag MUST uniquely identify the underlying ringnoc.Client instantiated inside this RingMAC. If you are careful to only use RingMACServer.new_client() to create these, all of these assumptions will be held.

tiliqua.dsp.mac.RingMACServer(max_clients=16, mtype=SQNative)

Factory for creating a MAC message ring.

Prior to elaboration, Server.new_client() may be used to add additional client nodes to this ring. During elaboration, all clients (and this server) are connected in a ring, and a single shared DSP tile is instantiated to serve requests.

Returns:

ringnoc.Server configured for DSP tile sharing of operands.a * operands.b.

ringnoc: Message Ring implementation

‘Message Ring’ Network-on-Chip (NoC) implementation.

Components for connecting N ‘clients’ and 1 ‘server’ in a circular shift register topology. This enables efficient resource sharing (e.g., DSP tiles) across many components without needing huge muxes. For example:

  tag=0      tag=1      tag=2      tag=3
┌───────┐  ┌───────┐  ┌───────┐  ┌───────┐
│client0┼──►client1┼──►client2┼──►client3│
└───▲───┘  └───────┘  └───────┘  └───┬───┘
    │                                │
    │            ┌──────┐            │
    └────────────┼server│◄───────────┘
                 └──────┘

On the message ring, messages are shifted in a large circular shift register by one node every clock. Each message is of layout Config.msg_layout.

A client may only send a message if there is an INVALID message being shifted into it. This keeps latency bounded and removes the need for extra storage. A server may ‘respond’ to a client message by shifting in the payload and shifting out the response. Each message contains a unique tag per client, so servers know where a request came from, and clients know if an incoming message is destined for it.

Assuming all N clients send a request message in the same clock, and the server processes one request per clock, the result will arrive at all clients N+1 clocks later, with the server busy for N out of N+1 of those clocks.

To use this, you will want to create your components building on ringnoc.Client and ringnoc.Server. An example of this (sharing DSP tiles) is found in this repository as mac.RingMAC (client) and mac.RingMACServer.

class tiliqua.ringnoc.Config(tag_bits, payload_type_client, payload_type_server)

Configuration (message layout) of a message ring.

class tiliqua.ringnoc.NodeSignature(cfg)

Both Client and Server nodes must have these connections in order to participate in the message ring.

Messages shift in on i and out on o. Server is currently responsible for connecting these as appropriate after all Client instances are created.

class tiliqua.ringnoc.Client(*args, src_loc_at=0, **kwargs)

Client node. Nominally transparent (shifting incoming messages to outgoing messages unmodified).

To issue a request, i should be set, and strobe asserted, until valid is asserted. On the same clock that valid is asserted, o contains the answer from the server to our request.

Under the hood, Client will take care of not sending our request until the bus is free, and not asserting valid until an appropriate response has arrived.

class tiliqua.ringnoc.Server(*args, src_loc_at=0, **kwargs)

Process client requests. This component also manages the ring topology by creating clients and wiring them into a ring. When a valid client message arrives, process_request is used to compute a response.

client_class can be any class that exposes a NodeSignature, as long as it has a constructor that can take tag: int and cfg: Config. A new instance of this is created whenever the user calls new_client.

new_client()

Create and add a new client to the ring.