8. CASE STUDIES
The serializer/deserializer (SerDes) implements the physical layer of the communication stack. YARC
instantiates a high-speed SerDes in which each lane consists of two complimentary signals making
a balanced differential pair.
The SerDes is organized as a macro which replicates multiple lanes. For full duplex operation,
we must instantiate the 8-lane receiver as well as an 8-lane transmitter macro. YARC instantiates 48
8-lane SerDes macros, 24 8-lane transmit and 24 8-lane receive macros, consuming
91.32 mm 2
of the 289 mm 2 die area, which is almost 1/3 of the available silicon (Figure 6.7 ).
The SerDes supports two full-speed data rates: 5 Gbps or 6.25 Gbps. Each SerDes macro is
capable of supporting full, half, and quarter data rates using clock dividers in the PLL module. This
allows the following supported data rates: 6.25, 5.0, 3.125, 2.5, 1.5625, and 1.25 Gbps. We expect
to be able to drive a 6 meter, 26 gauge cable at the full data rate of 6.25 Gbps, allowing for adequate
PCB foil at both ends.
Each port on YARC is three bits wide, for a total of 384 low voltage differential signals coming
off each router, 192 transmit and 192 receive. Since the SerDes macro is 8 lanes wide and each YARC
port is only 3 lanes wide, a naive assignment of tiles to SerDes would have 2 and 2/3 ports (8 lanes)
for each SerDes macro. Consequently, we must aggregate three SerDes macros (24 lanes) to share
across eight YARC tiles (also 24 lanes). This grouping of eight tiles is called an octant and imposes
the constraint that each octant must operate at the same data rate.
The SerDes has a 16/20 bit parallel interface which is managed by the link control block
(LCB). The positive and negative components of each differential signal pair can be arbitrarily
swapped between the transmit/receive pair. In addition, each of the 3 lanes which comprise the
LCB port can be permuted or “swizzled.” The LCB determines which are the positive and negative
differential pairs during channel initialization, as well as which lanes are “swizzled”. This degree of
freedom simplifies the board-level river routing of the channels and reduces the number of metal
layers on a PCB for the router module.
CRAY XT MULTIPROCESSOR
The Cray XT4 system scales up to 32k nodes using a bidirectional three-dimensional torus intercon-
nection network. Each node in the system consists of an AMD64 superscalar processor connected to
a Cray Seastar chip [ 13 ] (Figure 8.5 ) which provides the processor-network interface, and 6-ported
router for interconnecting the nodes. The system supports an efficient distributed memory mes-
sage passing programming model. The underlying message transport is handled by the Portals [ 11 ]
The Cray XT interconnection network has several key features that set it apart from other
scales up to 32K network endpoints,
high injection bandwidth using HypterTransport (HT) links directly to the network interface,