Pip: Detecting the Unexpected in Distributed Systems



Pip finds structural and performance bugs in distributed systems by comparing actual system behavior to expected system behavior. Actual system behavior is based on program annotations, interposition, or sniffing -- or some combination of the above. Expected system behavior is described in an external specification of the system's call graph and communication protocols.

Pip automatically checks actual behavior against expected behavior and helps programmers visualize the result to discover the causes of any unexpected behavior.

Keywords: debugging, distributed systems, expectations, causal paths




Pip has a GUI, Pathview, to visualize the traced behavior of applications, highlight classes of behavior (e.g., unexpected behavior), and plot all sorts of resource metrics. Here are two pictures from Pathview:

[Pathview DAG view]

DAG view: Each causal path in a system can be shown as a directed acyclic graph (DAG), highlighting behavior on a given host, in a given event handler, or in a given individual task.
[Pathview graph view]

Graph view: Pathview can plot 15 different resource metrics for tasks or for sets of paths. Shown here is the end-to-end latency in milliseconds for three different varieties of read operations in FAB, as a cumulative distribution function (CDF).