distributed structures for both switch and virtual channel allocation that scale well to high port
VC requests (1 bit)
(log 2 k bits)
v : 1
v : 1
Global Output Arbiter
Figure 6.3: Scalable switch allocator architecture. The input arbiters are localized but the output ar-
biters are distributed across the router to limit wiring complexity. A detailed view of the output arbiter
corresponding to output k is shown to the right.
We address the scalability of the switch allocator by using a distributed separable allocator
design as shown in Figure 6.3 . The allocation takes place in three stages: input arbitration, local
output arbitration, and global output arbitration. During the first stage all ready virtual channels in
each input controller request access to the crossbar switch. The winning virtual channel in each input
controller then forwards its request to the appropriate local output arbiter by driving the binary code
for the requested output onto a per-input set of horizontal request lines.
At each output arbiter, the input requests are decoded and, during stage two, each local output
arbiter selects a request (if any) for its switch output from among a local group of m (in Figure 6.3 ,
8) input requests and forwards this request to the global output arbiter. Finally, the global
output arbiter selects a request (if any) from among the k/m local output arbiters to be granted
access to its switch output. For very high-radix routers, the two-stage output arbiter can be extended
to a larger number of stages.
At each stage of the distributed arbiter, the arbitration decision is made over a relatively small
number of inputs (typically 16 or less) such that each stage can fit in a clock cycle. For the first
two stages, the arbitration is also local - selecting among requests that are physically co-located.
For the final stage, the distributed request signals are collected via global wiring to allow the actual
arbitration to be performed locally. Once the winning requester for an output is known, a grant
signal is propagated back through to the requesting input virtual channel. To ensure fairness, the
arbiter at each stage maintains a priority pointer which rotates in a round-robin manner based on