Networking Reference
In-Depth Information
packet arrives at the output port, the sender checks the available credit counter. For wormhole flow
control [ 20 ] across the link, the sender's available credit needs to only be one or more. For virtual
cut-through (VCT) [ 20 , 22 ] flow control across the link, the sender's available credit must be more
than the size of the packet. In practice, the switch hardware doesn't have to track the size of the
packet in order to allow VCT flow control. The sender can simply check the available credit count
is larger than the maximum packet size.
It may be an extreme example comparing a typical datacenter server to a state-of-the-art super-
computer node, but the fact remains that Ethernet is gaining a significant foothold in the high-
performance computing space with nearly 50% of the systems on the TOP500 list [ 62 ] using Gi-
gabit Ethernet as shown in Figure 1.5 (b). Infiniband (includes SDR, DDR and QDR) accounts
for 41% of the interconnects leaving very little room for proprietary networks. The landscape was
very different in 2002, as shown in Figure 1.5 (a), where Myrinet accounted for about one third of
the system interconnects. The IBM SP2 interconnect accounted for about 18%, and the remaining
50% of the system interconnects were split among about nine different manufacturers. In 2002, only
about 8% of the TOP500 systems used gigabit Ethernet, compared to the nearly 50% in June of
No doubt “cloud computing” benefited from this wild growth and acceptance in the HPC community,
driving prices down and making more reliable parts. Moving forward we may see even further
consolidation as 40 Gig Ethernet converges with some of the Infiniband semantics with RDMA
over Ethernet (ROE). However, a warehouse-scale computer (WSC) [ 9 ] and a supercomputer have
different usage models. For example, most supercomputer applications expect to run on the machine
in a dedicated mode, not having to compete for compute, network, or IO resources with any other
Supercomputing applications will commonly checkpoint their dataset, since the MTBF of a
large system is usually measured in 10s of hours. Supercomputing applications also typically run with
a dedicated system, so QoS demands are not typically a concern. On the other hand, a datacenter
will run a wide variety of applications, some user-facing like Internet email, and others behind the
scenes. The workloads vary drastically, and programmers must learn that hardware can, and does,
fail and the application must be fault-aware and deal with it gracefully. Furthermore, clusters in the
datacenter are often shared across dozens of applications, so performance isolation and fault isolation
are key to scaling applications to large processor counts.
Choosing the “right” topology is important to the overall system performance. We must take
into account the flow control, QoS requirements, fault tolerance and resilience, as well as workloads
to better understand the latency and bandwidth characteristics of the entire system. For example,
Search MirCeyron ::

Custom Search