Networking Reference
In-Depth Information
input 1
input 1
input 2
input 2
input k
input k
·
·
·
·
·
·
(a)
(b)
Figure 6.4: Block diagram of a (a) baseline crossbar switch and (b) fully buffered crossbar switch.
Virtual channel allocation (VA) poses an even more difficult problem than switch allocation
because the number of resources to be allocated is multiplied by the number of virtual channels v .
In contrast to switch allocation, where the availability of free downstream buffers is tracked with a
credit count, with virtual channel allocation, the availability of downstream VCs is unknown. An
ideal VC allocator would allow all input VCs to monitor the status of all output VCs they are waiting
on. Such an allocator would be prohibitively expensive, with v 2 k 2 wiring complexity.
Building off the ideas developed for switch allocation, a scalable virtual channel allocator
architectures can be built. The state of the output virtual channels are maintained at each crosspoint,
and allocation is also performed at the crosspoints. However, VA involve speculation where switch
allocation proceeds before virtual channel allocation is complete to reduce latency. Simple virtual
channel speculation was proposed in [ 52 ] where the switch allocation and the VC allocation occurs
in parallel to reduce the critical path through the router. With a deeper pipeline in a high-radix
router, VC allocation is resolved later in the pipeline, which leads to more aggressive speculation
6.3
FULLY BUFFERED CROSSBAR
Adding buffering at the crosspoints of the switch (Figure 6.4 b) decouples input and output vir-
tual channel and switch allocation. This decoupling simplifies the allocation, reduces the need for
speculation, and overcomes the performance problems of the baseline architecture with distributed,
speculative allocators. Since input and output switch allocation are completely decoupled, a flit whose
request wins the input arbitration is immediately forwarded to the crosspoint buffer corresponding
to its output. At the crosspoint, local and global output arbitration are performed as in the unbuffered
Search MirCeyron ::




Custom Search