Internet Systems and Storage Group
Software architectures
for Internet-scale computing

 
COD is now part of a larger project on automated cyberinfrastructure. Please see the other COD page.

COD: Cluster-on-Demand

Clustering inexpensive computers is an effective way to obtain reliable, scalable computing power for network services and compute-intensive applications. Since clusters have a high initial cost of ownership, including space, power conditioning, and cooling equipment, leasing or sharing access to a common cluster is an attractive solution when demands vary over time. Shared clusters offer economies of scale and more effective use of resources by multiplexing.

Users of a shared cluster should be free to select the software environments that best support their needs. Cluster-on-Demand (COD) is a system to enable rapid, automated, on-the-fly partitioning of a physical cluster into multiple independent virtual clusters. A virtual cluster (vcluster) is a group of machines (physical or virtual) configured for a common purpose, with associated user accounts and storage resources, a user-specified software environment, and a private IP address block and DNS naming domain. COD vclusters are dynamic; their node allotments may change according to demand or resource availability.

COD was inspired by Oceano, an IBM Research project to automate a Web server farm. Like Oceano, COD leverages remote-boot technology to reconfigure cluster nodes using database-driven network installs from a set of user-specified configuration templates, under the direction of a policy-based resource manager. Emulab uses a similar approach to configure groups of nodes for network emulation experiments on a shared testbed. COD is complementary to both of these efforts: it decouples cluster management functions from network emulation, and adds a hierarchical framework for dynamic resource management that generalizes to multiple classes of cluster applications.

Papers and Presentations

 
Talk by Justin Moore at IBM Workshop for Triangle-area research
11/03/2003
 
Talk by Justin Moore and Richard Lucic at IBM Education workshop
10/29/2003
 
Dynamic Virtual Clusters in a Grid Site Manager by Jeff Chase, Laura Grit, David Irwin, Justin Moore, and Sara Sprenkle. In the Twelfth International Symposium on High Performance Distributed Computing (HPDC-12), June 2003.
04/24/2003
 
DRAFT: Managing Mixed-Use Clusters with Cluster-on-Demand
11/20/2002
 
Servers in the Mist (Chase's talk at UCSD)
11/05/2002
 
Justin Moore's Work-in-Progress talk at USENIX
06/22/2002
 
Justin Moore's talk at HP Labs
06/19/2002
 
Cluster on Demand: early Duke CS Tech Report
05/31/2002
 

Software Downloads

 
SeASR: Sensor Analysis and Synthetic Reproduction.
v0.1.0
03/04/2004
Gamut: Generic Application eMUlaTion (formerly Sstress)
v0.7.0
10/17/2005

Team Members