Networking Reference
In-Depth Information
the LCB sideband. These VC acks are used to increment the per-vc credit counters in the output
port logic. The ok field in the EOP phit indicates if the packet is healthy, encountered a transmission
error on the current link ( transmit_error ), or was corrupted prior to transmission ( soft_error ). The
YARC internal datapath uses the CRC to detect soft errors in the pipeline data paths and static
memories used for storage. Before transmitting a tail phit onto the network link, the LCB will check
the current CRC against the packet contents to determine if a soft error has corrupted the packet.
If the packet is corrupted, it is marked as soft_error , and a good CRC is generated so that it is not
detected by the receiver as a transmission error. The packet will continue to flow through the network
marked as a bad packet with a soft error and eventually be discarded by the network interface at the
destination processor.
The narrow links of a high-radix router cause a higher serialization latency to squeeze the
packet over a link. For example, a 32B cache-line write results in a packet with 19 phits (6 header,
12 data, and 1 EOP). Consequently, the LCB passes phits up to the higher-level logic speculatively ,
prior to verifying the packet CRC, which avoids store-and-forward serialization latency at each hop.
However, this early forwarding complicates various error conditions in order to correctly handle a
packet with a transmission error and reclaim the space in the input queue at the receiver.
Because a packet with a transmission error is speculatively passed up to the router core and
may have already flowed to the next router by the time the tail phit is processed, the LCB and
input queue must prevent corrupting the router state. The LCB detects packet CRC errors and
marks the packet as transmit_error with a corrected CRC before handing the end-of-packet (EOP)
phit up to the router core. The LCB also monitors the packet length of the received data stream
and clips any packets that exceed the maximum packet length , which is programmed into an LCB
configuration register. When a packet is clipped, an EOP phit is appended to the truncated packet
and it is marked as transmit_error . On either error, the LCB will enter error recovery mode and await
the retransmission.
The input queue in the router must protect from overflow. If it receives more phits than can be
stored, the input queue logic will adjust the tail pointer to excise the bad packet and discard further
phits from the LCB until the EOP phit is received. If a packet marked transmit_error is received at
the input buffer, we want to drop the packet and avoid sending any virtual channel acknowledgments.
The sender will eventually timeout and retransmit the packet. If the bad packet has not yet flowed
out of the input buffer, it can simply be removed by setting the tail pointer of the queue to the tail
of the previous packet. Otherwise, if the packet has flowed out of the input buffer, we let the packet
go and decrement the number of virtual channel acknowledgments to send by the size of the bad
packet. The transmit-side router core does not need to know anything about recovering from bad
packets. All effects of the error are contained within the LCB and YARC input queueing logic.
Search MirCeyron ::

Custom Search