Networking Reference
In-Depth Information
column
channel
column buffers
row
buffers
O U T0
O U T1
O U T7
SerDes
N x N
switch
N x N
switch
8 x 8
switch
N x N
switch
N x N
switch
8 x 8
switch
N x N
switch
N x N
switch
8 x 8
switch
...
...
...
input
buffers
I N0
I N1
I N7
Route
Route
Tile(0,1)
Route
Tile(0,7)
Tile(0,0)
O U T8
O U T9
O U T15
N x N
switch
N x N
switch
8 x 8
switch
N x N
switch
N x N
switch
8 x 8
switch
N x N
switch
N x N
switch
8 x 8
switch
...
...
...
row
bus
{
I N8
I N9
I N1 5
Route
Route
Route
Tile(1,0)
Tile(1,1)
Tile(1,7)
O U T56
O UT 57
O U T63
8 x 8
switch
8 x 8
switch
8 x 8
switch
N x N
switch
N x N
switch
N x N
switch
N x N
switch
N x N
switch
N x N
switch
...
...
...
I N5 6
IN 57
I N6 3
Tiles
Route
Route
Route
Tile(7,0)
Tile(7,1)
Tile(7,7)
(a)
(b)
Figure 6.7: (a) Block diagram of the Cray YARC router and (b) die photo (courtesy Cray Inc) .
and an output port but an 8
speedup is provided at both the input and the output ports. Both the
hierarchical organization (Section 6.4 ) and the YARC router provide an input speedup [ 20 ] since
each input port is connected to all subswitches in its row. However, the YARC router exploits the
abundant wire resources available on-chip as output speedup is also provided from the subswitches -
i.e., the outputs of the subswitch are fully connected to all the outputs in each column. In comparison,
a global bus was assumed for each output port in the hierarchical organization in Section 6.4 . With
the large number ports in a high-radix router, the output arbitration needs to be broken into multiple
stages and the YARC router also performs output arbitration in two stages. The first stage arbitrates
for the outputs of the subswitches and the second stage arbitrates for the output ports among the
subswitches' outputs in each column. However, by providing output speedup, the output arbitration
is simplified because the arbiter is local to the output port rather than being a central, shared resource.
Although there are abundant amount of wire resources available on-chip, the buffering avail-
able on-chip to implement the YARC router microarchitecture is limited. Thus, the intermediate
buffers (row buffers and the column buffers) are area-constrained and the number of entries in these
buffers are limited. As a result, although virtual cut-through flow control is implemented across
YARC routers in the network, wormhole flow control is implemented within the YARC router -
across row buffers and column buffers.
×
Search MirCeyron ::




Custom Search