To enable high-radix topologies described in earlier chapters, a scalable switch microarchitecture
is needed that can scale to a high port count. Conventional router microarchitecture for low-radix
topologies had a limited number of ports (i.e., 6 to 8 ports) and thus, centralized arbitration could
be used. However, arbitration logic is proportional the O(k 2 ) where k is the router radix (number
of input and output ports). In this chapter, we describe a baseline router design, similar to that used
for a low-radix router [ 49 , 55 ]. This design scales poorly to high radix due to the complexity of the
allocators and the wiring needed to connect them to the input and output ports. To overcome this
limitation while also providing high performance, we describe a hierarchical switch organization that
uses intermediate buffering to decouple the allocation between inputs and outputs while reducing
the amount of intermediate buffers required.
ROUTER MICROARCHITECTURE BASICS
A block diagram of the baseline router architecture is shown in Figure 6.1 . Arriving data is stored in
the input buffers. These input buffers are typically separated into several parallel virtual channels that
can be used to prevent deadlock, implement priority classes, and increase throughput by allowing
blocked packets to be passed. The input buffers and other router resources are allocated in fixed-size
units called flits, and each packet is broken into one or more flits as shown in Figure 6.2 (a).
The progression of a packet through this router can be separated into per-packet and per-flit
steps. The per-packet actions are initiated as soon as the header flit , the first flit of a packet, arrives:
1. Route computation (RC) - based on information stored in the header, the output port of the
packet is selected.
2. Virtual-channel allocation (VA) - a packet must gain exclusive access to a downstream virtual
channel associated with the output port from route computation. Once these per-packet steps
are completed, per-flit scheduling of the packet can begin.
3. Switch allocation (SA) - if there is a free buffer in its output virtual channel, a flit can vie for
access to the crossbar.
4. Switch traversal (ST) - once a flit gains access to the crossbar, it can be transferred from its
input buffers to its output and on to the downstream router.