Multiplier sharing (dsp.mac)
dsp.mac: DSP tile sharing
For audio rate signals, where sample rates are low and the desired amount of separate functional blocks is high, sharing DSP tiles is essential. Without sharing DSP tiles, multipliers are often the first FPGA resource to be exhausted.
This file provides mechanisms for sharing DSP tiles (multipliers) amongst multiple components using 2 different strategies:
MuxMAC: One DSP tile is time multiplexed. Latency relatively low, however sharing >3x MACs quickly blows up resource usage.
RingMAC: Message ring sharing. Multiple components are connected in a message ring (essentially a large circular shift register). On each ring, there is a single DSP tile processing multiplies. DSP tile throughput of near 100% is still achievable, however latency is higher.
- class tiliqua.dsp.mac.MAC(*args, src_loc_at=0, **kwargs)
Base class for MAC strategies. Subclasses provide the concrete strategy.
Users of this component perform multiplications using
MAC.Multiply(m, ...), which may have different latency depending on the concrete strategy.- Multiply(m, **operands)
Compute
z = a*b, returning a context object which is active in the same clock that the answer is available onself.result.z.Ensure operands do NOT change until the operation completes.
For example:
s_a = fixed.Const(0.5, shape=mac.SQNative) s_b = fixed.Const(0.25, shape=mac.SQNative) s_z = Signal(ASQ) with m.FSM() as fsm: # ... some states ... with m.State('MAC'): # Set up multiplication # Read as: ``m.If(result_available)`` with mp.Multiply(m, a=s_a, b=s_b): m.d.sync += s_z.eq(mp.result.z) m.next = 'DONE' # ... some more states ...
- default()
Default MAC provider for DSP components if None is specified.
- class tiliqua.dsp.mac.MuxMAC(*args, src_loc_at=0, **kwargs)
A Multiplexing multiplication provider.
Instantiates a single multiplier, shared between users of this MuxMAC using time division multiplexing, as follows:
a₁*b₁ ───►│ ├───► result₁ a₂*b₂ ───►├──►[DSP Tile]──►┼───► result₂ a₃*b₃ ───►│ ├───► result₃ mux demuxWhen sharing amongst lots of cores, the required multiplexer size can quickly become unusably large.
- class tiliqua.dsp.mac.RingMAC(*args, src_loc_at=0, **kwargs)
A message-ring-backed multiplication provider, where DSP tiles are shared between many components.
The common pattern here is that each functional block tends to use a single
RingMAC, even if it has multiple MAC steps. That is, theRingMACitself is Mux’d within a core, however all requests land on the same shared bus which is a message ring connecting different cores. This keeps multiplexers local to each core:┌─────────────────────────────┐ │ │ ┌─────────────────┼─────────────────────┐ │ │ │ │ │ │a₁b₁ ───►│ │ ├───► result₁│ │ │a₂b₂ ───►├──[RingClient]─►┼───► result₂│ │ │a₃b₃ ───►│ ▲ ├───► result₃│ │ │ mux │ demux │ │ └─────────────────┼─────────────────────┘ │ Component1 │ │ │ │ ┌─────────────────┼─────────────────────┐ ▼ │ │ │ ┌──────────┐ │a₁b₁ ───►│ │ ├───► result₁│ │ │ │a₂b₂ ───►├──[RingClient]─►┼───► result₂│ │[DSP Tile]│ │a₃b₃ ───►│ ▲ ├───► result₃│ │ │ │ mux │ demux │ └──────────┘ └─────────────────┼─────────────────────┘ RingMACServer Component2 │ │ │ │ ┌─────────────────┼─────────────────────┐ │ │ │ │ │ │a₁b₁ ───►│ │ ├───► result₁│ │ │a₂b₂ ───►├──[RingClient]─►┼───► result₂│ │ │a₃b₃ ───►│ ▲ ├───► result₃│ │ │ mux │ demux │ │ └─────────────────┼─────────────────────┘ │ Component3 │ │ └─────────────────────────────┘This provides near-optimal scheduling for message rings composed of components that have the same state machines.
Normally these should only be created from an existing server using
RingMACServer.new_client(). This automatically hooks up theringandtagattributes, but does NOT add it as a submodule for elaboration (you must do this).Contains no multiplier,
ringmust be hooked up to a message ring on which aRingMACServercan be found.tagMUST uniquely identify the underlyingringnoc.Clientinstantiated inside thisRingMAC. If you are careful to only useRingMACServer.new_client()to create these, all of these assumptions will be held.
- tiliqua.dsp.mac.RingMACServer(max_clients=16, mtype=SQNative)
Factory for creating a MAC message ring.
Prior to elaboration,
Server.new_client()may be used to add additional client nodes to this ring. During elaboration, all clients (and this server) are connected in a ring, and a single shared DSP tile is instantiated to serve requests.- Returns:
ringnoc.Server configured for DSP tile sharing of
operands.a * operands.b.
ringnoc: Message Ring implementation
‘Message Ring’ Network-on-Chip (NoC) implementation.
Components for connecting N ‘clients’ and 1 ‘server’ in a circular shift register topology. This enables efficient resource sharing (e.g., DSP tiles) across many components without needing huge muxes. For example:
tag=0 tag=1 tag=2 tag=3 ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │client0┼──►client1┼──►client2┼──►client3│ └───▲───┘ └───────┘ └───────┘ └───┬───┘ │ │ │ ┌──────┐ │ └────────────┼server│◄───────────┘ └──────┘
On the message ring, messages are shifted in a large circular shift register
by one node every clock. Each message is of layout Config.msg_layout.
A client may only send a message if there is an INVALID message being shifted
into it. This keeps latency bounded and removes the need for extra storage.
A server may ‘respond’ to a client message by shifting in the payload and
shifting out the response. Each message contains a unique tag per client,
so servers know where a request came from, and clients know if an incoming
message is destined for it.
Assuming all N clients send a request message in the same clock, and the server processes one request per clock, the result will arrive at all clients N+1 clocks later, with the server busy for N out of N+1 of those clocks.
To use this, you will want to create your components building on ringnoc.Client
and ringnoc.Server. An example of this (sharing DSP tiles) is found in
this repository as mac.RingMAC (client) and mac.RingMACServer.
- class tiliqua.ringnoc.Config(tag_bits, payload_type_client, payload_type_server)
Configuration (message layout) of a message ring.
- class tiliqua.ringnoc.NodeSignature(cfg)
Both Client and Server nodes must have these connections in order to participate in the message ring.
Messages shift in on
iand out ono.Serveris currently responsible for connecting these as appropriate after allClientinstances are created.
- class tiliqua.ringnoc.Client(*args, src_loc_at=0, **kwargs)
Client node. Nominally transparent (shifting incoming messages to outgoing messages unmodified).
To issue a request,
ishould be set, andstrobeasserted, untilvalidis asserted. On the same clock thatvalidis asserted,ocontains the answer from the server to our request.Under the hood,
Clientwill take care of not sending our request until the bus is free, and not assertingvaliduntil an appropriate response has arrived.
- class tiliqua.ringnoc.Server(*args, src_loc_at=0, **kwargs)
Process client requests. This component also manages the ring topology by creating clients and wiring them into a ring. When a valid client message arrives,
process_requestis used to compute a response.client_classcan be any class that exposes aNodeSignature, as long as it has a constructor that can taketag: intandcfg: Config. A new instance of this is created whenever the user callsnew_client.- new_client()
Create and add a new client to the ring.