Distributed Systems Communications Dependencies

The performance and relative capability of a distributed systems is often limited by the communications network that links the various computers in the system. The study of how these specialized networks work is a field unto itself. Below are links to various papers within this field. The URLs are broken into several topics: hardware, TCP/IP, RDMA, software, and latency and bandwidth effects.

Hardware

URLDescriptionDate SubmittedReviewer
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics (PDF, Link updated!) A comprehensive performance comparison of MPI implementations over InfiniBand, Myrinet and Quadrics. 9/13/05
Clusters, interconnect architecture, and applications Compares different clustering network interconnects. 10/01/05 Sebastian Niezgoda
Initial End-to-End Performance Evaluation of 10-Gigabit Ethernet (PDF) Performance Evaluation of 10-gigabit Ethernet 10/24/05 Sebastian Niezgoda
Message-Passing for the 21st Century: Integrating User-Level Networks with SMT (PDF) Describes the use of an I/O processor to improve the performance of device I/O and interprocessor communication. 11/13/05
DQSA and Universal Networking Describes Distributed Queue Switch Architecture which is a networking technology that eliminates the needs for switches and routers. 11/21/05

TCP/IP

URLDescriptionDate SubmittedReviewer
TOE: TCP/IP Offload Engine relieves CPU burden (PDF) Discusses how the addition of a Network Accelerator Card (NAC) with an integrated TCP/IP Offload Engine (TOE) can relieve the burden on the CPU. 9/17/05 Anthony Kim
TCP Onloading for Data Center Servers (PDF) To meet the increasing networking needs of server workloads, servers are starting to offload packet processing to peripheral devices to achieve TCP/IP acceleration. Researchers at Intel Labs are experimenting with alternative solutions that improve the server's ability to process TCP/IP packets efficiently and at very high rates. 10/10/05 Karen Cosgrove
Enabling High Performance Data Transfers The objectives of this page are to summarize all of the end system network tuning issues, provide easy configuration checks for non-experts, and maintain a repository of operating system specific advice and information about getting the best possible network performance on these platforms. 10/10/05 Abdo Achkar
PSockets: The Case for Application-level Network Striping for Data Intensive Applications using High Speed Wide Area Networks (PDF) With the emergence of high-speed wide area networks various improvements have been applied to TCP to reduce latency and achieve improved bandwidth. The improvement is achieved by having system administrators tune the network and can take a considerable amount of time. This paper introduces PSockets (Parallel Sockets), a library that achieves an equivalent performance without manual tuning. 10/10/05
Protocol Service Decomposition for High-Performance Networking (Multiple formats available on upper right of page including PDF) In this paper the authors describe a new approach to implementing network protocols that enables them to have high performance and high flexibility, while retaining complete conformity to existing application programming interfaces. 10/24/05
A Comparison of TCP Automatic Tuning Techniques for Distributed Computing (PDF) Rather than painful, manual, static, per-connection optimization of TCP buffer sizes simply to achieve acceptable performance for distributed applications, many researchers have proposed techniques to perform this tuning automatically. This paper first discusses the relative merits of the various approaches in theory, and then provides substantial experimental data concerning two competing implementations - the buffer autotuning already present in Linux 2.4.x and "Dynamic Right-Sizing". 10/24/05

RDMA

URLDescriptionDate SubmittedReviewer
RAMS: A RDMA-enabled I/O Cache Architecture for Clustered network (Multiple formats available on upper right of page including PDF) Previous studies show that intra-cluster communication easily becomes a major performance bottleneck for a wide range of small write-sharing workloads especially read-only workloads in modern clustered network servers. A Remote Direct Memory Access (RDMA) technique has been recommended by many researchers to address the problem but how to well utilize RDMA is still in its infancy. This paper proposed a novel solution to boost intra-cluster communication performance. 10/30/05
Evaluating the Impact of RDMA on Storage I/O over InfiniBand (Multiple formats available on upper right of page including PDF) Presents an extension of iSCSI to support RDMA operations over InfiniBand and measures the effects on performance. 10/30/05
Sockets vs RDMA Interface over 10-Gigabit Networks: An In-depth analysis of the Memory Traffic Bottleneck (PDF) Compares TCP/IP based communication to RDMA communication with special attention to memory bandwidth and CPU utilization. 11/13/05

Software

URLDescriptionDate SubmittedReviewer
Communication Benchmarking and Performance Modelling of MPI Programs on Cluster Computers (PDF) Gives an overview of two related tools that have been developed to provide more accurate measurement and modelling of the performance of message passing programs and communications on distributed memory parallel computers. 10/01/05
Supporting MPI-2 One Sided Communication on Multi-Rail InfiniBand Clusters: Design Challenges and Performance Benefits (PDF) Designs one-sided MPI communication system to run over multi-rail (multiple ports per compute node) InfiniBand clusters. 11/22/05

Latency and Bandwidth Effects

URLDescriptionDate SubmittedReviewer
Sensitivity of Parallel Applications to Large Differences in Bandwidth and Latency in Two-Layer Interconnects (PDF) Studies application performance on systems with strongly non-uniform remote memory access. 9/25/05 David Vaglia
The Effect of Latency on User Performance in Warcraft III (Multiple formats available including PDF) In this work, the authors design and conduct user studies that measure the impact of latency on user performance in Warcraft III, a popular RTS game. 11/05/05 David Vaglia
Effects of Communication Latency, Overhead, and Bandwidth in a Cluster Architecture (Availble in Postscript) This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. 11/05/05
Modeling Network Latency and Parallel Processing in Distributed Database Design This paper designs a parallel database system that takes the effects of latency into account. 12/03/05



By David Acker
Back to Distributed Systems Page

Adobe Reader Get Adobe Reader to view PDF files
Get Ghostscript and GSView to view Postscript files