The primary goal of this short course is to provide a brief introduction to parallel computing techniques in the geosciences. We hope the course will be practical, in the sense that participants will:
Emphasis throughout the short course will be placed on practical solutions to computational problems in the geosciences. If at the end of the course participants are able to visualize parallel solutions to their specific computational problems, then we will consider the course a success!
We anticipate that course participants will have varied computing experience and we will work hard to accommodate all levels. We have elected to divide the short course into two parts. The first day will be devoted to an introduction and overview of parallel programming methods. It is open to all with any level of computing experience. The second day will be devoted to more advanced topics and is specifically geared toward those who have prior experience in computing and numerical simulation. Day 2 will concentrate on application of parallel programming techniques in finite differences and related problems.
Day 1 (Morning)
Day 1 (Afternoon)
Day 2 (Morning)
Day 2 (Afternoon)
Parallel programming is the art of writing computer programs that use more than one computer processor simultaneously to solve a problem.
Parallel computing methods are increasingly used to solve computationally intensive (or expensive) problems. In the Geosciences, these often involve evaluation of mathematical models that are derived from conceptual models based on observations of the natural world. Three areas where parallel processing techniques are readily used are: Stochastic Parameter Sampling, Optimization, and Numerical Simulation.
Of course many problems will contain elements of each of the above. It is reasonable to cast parameter variation in a karst aquifer probabilistically, and calculate a large range of bulk transmissivities by running the numerical simulation many times with stochastically sampled parameter values. It may be possible to identify an optimal set of parameters to explain transmissivities observed in an actual karst aquifer. Real problems in the geosciences can be "made parallel" at a number of levels.

Note: Once you are in this loop there is no escape!
No they don't dance...
A beowulf cluster is a networked set of computers dedicated to parallel computations. This Beowulf network usually consists of "off the shelf" personal computers connected by an ethernet. Clusters usually run the Linux operating system (a freely available version of UNIX) and use the programming tool MPI (Message Passing Interface) for splitting up the work to be done, compiling, and running this work, in parallel, on each computer in the cluster. The first beowulf cluster was constructed in 1994 at the Goddard Space Flight Center for modeling "grand" problems in the Earth and Space Sciences. Thomas Sterling and Don Becker built the cluster and called it beowulf - for better or worse the name stuck.
Smooth communication is the key factor that transforms a "bunch of computers" into a beowulf cluster. In a typical cluster, each computer contains an ethernet card and communicates with the other computers via a switch. In this configuration, each computer becomes a node on a private network. One computer contains two ethernet cards to connect the private cluster to the public Internet. This computer, known as the master node, coordinates communication between the other nodes, called slaves. Normally the slave nodes do not have peripheral components (e.g., monitor, keyboard, mouse).The master node functions as both a member of the cluster and as the network server/coordinator for the cluster. Users run their MPI programs on the master node, which in turn splits up the work among the various slave nodes, waits for each slave to compute a solution, and then combines each partial-solution into a final answer for the user. Files are shared among the cluster nodes via NFS (Network File System). Each of the slave nodes "mount", via NFS, certain directories located on the master node (e.g. /home /usr/local) and use the files located in these directories as their own. In this way, each node can run a single program compiled by the user on the master node.
The idea of using affordable and scalable hardware components for parallel computing has become very popular because it makes "high-end" computing "highly" affordable. Beowulf clusters do not need to be large. In 1999, we (CC and LC) built a beowulf cluster consisting of 4 computers, each containing a 500 MHz processor. With an efficient parallel application, this Beowulf cluster of 4, 500 MHz nodes ran at the equivalent of about 2 GHz, not bad for the time.
Of course, each node can be upgraded with new cpu's, network cards, etc, as these inevitably improve. This points out another advantage of the Beowulf paradigm. Todays highly sophisticated and proprietary "supercomputer" invariably becomes tomorrows boat anchor. While such advancement is fantastic, it is costly. Beowulf clusters exploit commodity parts, rendering them both scalable in terms of the number of computer nodes, and scalable in terms of their ease of upgrade.
Lots of information is available. Use www.beowulf.org as a gateway to the best sites.
If you are interested in building a beowulf cluster, see:
http://www.acm.org/crossroads/xrds6-1/parallel.html"
The Earth Sciences / Geography beowulf cluster at the University of Bristol is called tuya. A tuya is a landform created by volcano-glacial interaction. It also means "yours" in spanish, which is appropriate enough. Log on to tuya using secure shell, e.g.,
where "username" is your username. Copy files to and from tuya using secure copy, e.g.,
to put a file called "file.dat" in your main directory on your tuya account. Use:
to copy a file from your main directory on tuya. More information about using ssh on University of Bristol computers is available at the UoB SSH site.
MPI, the message passing interface, is a programming tool that facilitates network communication between the master and slave nodes of a beowulf cluster. More specifically, MPI serves to simplify the job of partitioning, sending, receiving, and recombining bytes of data over the network between the master and slave nodes during code execution. Just a handful of MPI-specific commands are required for the development of most parallel codes.
Lots of MPI discussion and examples are available. See the following:
Comprehensive introductions to MPI:
Additional simple examples: