maximum 32k node network. To make the routing mechanism more space-efficient, the 15-bit node
identifier is partitioned to allow a two-level hierarchical lookup: a small 8-entry table identifies a
region , the second table precisely identifies the node within the region. The region table is indexed by
the upper 3-bits of the destination field of the packet, and the low-order 12-bits identifies the node
within 4k-entry table. Each network port has a dedicated routing table and is capable of routing
a packet each cycle. This provides the necessary lookup bandwidth to route a new packet every
cycle. However, if each input port used a 32k-entry lookup table, it would be sparsely populated for
modest-sized systems, and use an extravagant amount of silicon area.
… up to 8 data flits (64 bytes) of payload …
Figure 8.7: Seastar packet format.
A two-level hierarchical routing scheme is used to efficiently lookup the egress port at each
router. Each router is assigned a unique node identifier, corresponding to its destination address.
Upon arrival at the input port, the packet destination field is compared to the node identifier. If
the upper three bits of the destination address match the upper three bits of the node identifier,
then the packet is in the correct global partition . Otherwise, the upper three bits are used to index
into the 8-entry global lookup table (GLUT) to determine the egress port. Conceptually, the 32k
possible destinations are split into eight, 4k partitions denoted by bits destination[11:0] of the
The SeaStar router has six full-duplex network ports and one processor port that interfaces
with the Tx/Rx DMA engine (Figure 8.6 ). The network channels operate at 3.2 Gb/s
12 lanes over
electrical wires, providing a peak of 4.8 GB/s per direction of network bandwidth. The link control
block (LCB) implements a sliding window go-back-N link-layer protocol that provides reliable
chip-to-chip communication over the network links. The router switch is both input-queued and
output-queued. Each input port has four (one for each virtual channel) 96-entry buffers, with each
entry storing one flit. The input buffer is sized to cover the round-trip latency across the network
link at 3.2 Gb/s signal rates. There are 24 staging buffers in front of each output port, one for each
input source (five network ports, and one processor port), each with four VCs. The staging buffers
are only 16 entries deep and are sized to cover the crossbar arbitration round-trip latency. Virtual
cut-through [ 37 ] flow control into the output staging buffers requires them to be at least 9 entries
deep to cover the maximum packet size.