The advantage of using Rocks to build and maintain your cluster is simple. Building clusters is straightforward, but managing its software can be complex. This complexity becomes most unmanagable during cluster installation and expansion. Rocks provides mechanisms to control the complexity of the cluster installation and expansion process, and provides performance monitoring tools as well.
This chapter describes the steps to build your cluster and install its software.
Since Rocks is built on top of RedHat Linux releases, Rocks supports all the hardware components that RedHat supports, but only supports the x86 and IA-64 architectures (no Alpha, SPARC or Yamhill).
x86 (ia32, AMD, etc.)
IA-64 (Itanium, McKinley, etc.)
x86_64 (AMD Opteron)
Ethernet (All flavors that RedHat supports, including Intel Gigabit Ethernet)
Myrinet (Lanai 9.x)
The first thing to manage is the physical deployment of a cluster. Much research exists on the topic of how to physically construct a cluster. The cluster cookbook can be a good resource. A majority of the O'Reilly Book Building Linux Clusters is devoted to the physical setup of a cluster, how to choose a motherboard, etc. Finally, the book How to Build a Beowulf also has some good tips on physical construction.
We favor rack-mounted equipment (yes, it is more expensive) because of its relative reliability and density. There are Rocks clusters, however, that are built from mini-towers. Choose what makes sense for you.
The physical setup for a Rocks Cluster contains one or more of the following node types:
Nodes of this type are exposed to the outside world. Many services (NFS, NIS, DHCP, NTP, MySQL, HTTP, ...) run on these nodes. In general, this requires a competent sysadmin. Frontend nodes are where users login in, submit jobs, compile code, etc. This node can also act as a router for other cluster nodes by using network address translation (NAT).
Frontend nodes generally have the following characteristics:
Two ethernet interfaces - one public, one private.
Lots of disk to store files.
These are the workhorse nodes. They are also disposable. Our management scheme allows the complete OS to be reinstalled on every compute node in a short amount of time (~10 minutes). These nodes are not seen on the public Internet.
Compute nodes have the following characteristics:
Ethernet Connection for administration
Disk drive for caching the base operating environment (OS and libararies)
Optional high-performance network (e.g., Myrinet)
All compute nodes are connected with ethernet on the private network. This network is used for administration, monitoring, and basic file sharing.
Application Message Passing Network
All nodes can be connected with Gigabit-class networks and required switches. These are low-latency, high-bandwidth networks that enable high-performance message passing for parallel programs.
The Rocks cluster architecture dictates these nodes types are connected as such:
On the compute nodes, the Ethernet interface that Linux maps to eth0 must be connected to the cluster's Ethernet switch. This network is considered private , that is, all traffic on this network is physically separated from the external public network (e.g., the internet).
On the frontend, two ethernet interfaces are required. The interface that Linux maps to eth0 must be connected to the same ethernet network as the compute nodes. The interface that Linux maps to eth1 must be connected to the external network (e.g., the internet or your organization's intranet).
Once you've physically assembled your cluster, each node needs to be set to boot without a keyboard. This procedure requires setting BIOS values and, unfortunately, is different for every motherboard. We've seen some machines where you cannot set them to boot without a keyboard.
If you are building an x86 cluster, download the Rocks Base bootable CD and the HPC roll found in Software for x86. Then find 2 blank CD-Rs and burn both images to their respective media.
If you are building an IA-64 cluster, download the Rocks Base + HPC DVD Software for ia64. Then burn the image onto a blank DVD-R.
If you are building an x86_64 (Opteron) cluster, download the Rocks Base bootable CD and the HPC roll found in Software for x86_64. Then find 2 blank CD-Rs and burn both images to their respective media.