Networking Reference
In-Depth Information
Closing Remarks
Interconnection networks are the glue that binds the otherwise loosely-coupled distributed memory
cluster systems that are common in datatcenter networks and the high-performance computing
(HPC) community. The system scale — number of processor sockets capable of being housed in a
single system — is impacted dramatically by the network. With exa -scale parallel computers being
designed with 100s of thousands and even millions of processing cores, the cost of power and its
associated delivery and cooling are becoming significant factors in the total expenditures of large-
scale datacenters. Barroso and Hölzle recently showed a mismatch between common server workload
profiles and server energy efficiency [ 8 ]. In particular, they show that a typical Google cluster spends
most of its time within the 10-50% CPU utilization range, but that servers are inefficient at these
levels. They therefore make the call for energy proportional computing systems that ideally consume
almost no power when idle and gradually consume more power as the activity level increases. As
of June, 2010 Top500 [ 62 ] list, the Cray XT5-HE with 224,162 processing cores achieving 1.759
petaflops and nearly 7 megawatts on the LINPACK benchmark 1 .
Warehouse-scale Computers (WSC) [ 9 ] such as those shown in Figure 1.1 fuel the Internet ap-
plications of today and tomorrow. WSC and HPC machines differ in programming models with
datatcenter clusters dominated by TCP socket-based models, and distributed memory HPC sys-
tems commonly use message passing interfaces like (MPI), or hierarchical programming models that
exploit shared memory (ccNUMA) within the node using an OpenMP interface and distributed
memory between nodes with MPI. These differences result in O(1 μ s) end-to-end message latency,
compared to O(100 μ s) of latency within datacenter servers. In large part, the software transport plays
a critical role in latency — with TCP transport and multiple kernel-user space copies — confound-
ing low-latency messaging. Efficient user-level messaging have been demonstrated with large-scale
global communications on the order of 1 μ s in the HPC community, where efficient fine-grain
communication and low-latency synchronization are hallmarks of scalable machines [ 7 , 15 , 35 , 55 ].
Supercomputers often take the design approach of building the entire machine from the most
efficient packaging, chip technology, and signaling. As a result, they typically don't have a high-
1 It is worth emphasizing that the Top500 list is simply a measure of how well a parallel computer solves systems of dense linear
algebra, and suitability to other tasks may vary.
Search MirCeyron ::

Custom Search