Lesson 4

Goals

This lesson introduces several useful graphing routines in GMT and simple techniques in probability estimation for discrete events (seismic events, volcanic eruptions). This lesson also introduces the use of perl scripts for simple computations.



Probability

Very often in geological hazards are very low probability, but very high consequence events. Earthquakes of M > 7.0 are infrequent in the western United States on the time scales of human experience and memory. In the entire western US, such earthquakes occur just a few times per decade. Volcanic eruptions are even less frequent than large earthquakes. For example, in the Cascade range an eruption that affects people living on the flanks of active volcanoes only occurs, on average, once every 30 years. It is of course difficult to plan for large eruptions, like the 1980 eruption of Mount St. Helens, when the frequency of these eruptions is low. In other words, if the recurrence rate of events is low, and the number of known events are few, it is difficult to develop reliable forecast of future activity. If the consequences of the event is serious enough, usually the case with geologic hazards, it is important to try to develop reasonable forecasts of future acitvity despite the difficulty. Therefore, a crucial component of geologic hazard assessment involves gleaning the most information possible from sparse, perhaps inadequate, data sets.

Techniques in probability analysis are well developed for many of the forecasting problems in geology. A simple question - Given the past eruption rates at a particular volcano, what is the likelihood of future eruptions within some time interval? This question can be answered using a variety of methods in probability analysis. Before plunging in, it is crucial to actually look critically at the data available.



Histograms

Consider the data in the file poas.dat (go ahead and list the file using the more command). The file includes a single column of numbers - the years of recorded volcanic eruptions at Poas volcano, Costa Rica. These data are from the "volcanoes of the world" by Simkin and Siebert. A total of 41 eruptions have been recorded for Poas volcano since 1828. Some volcanic eruptions were recorded prior to 1828, but the reliability of those eruption reports is questioned. The first step in an analysis of these eruption data is to look carefully at the data. Simply listing the data, as you have just done, is not particularly revealing. The auspicious thing is to plot a histogram. GMT can help with this:

pshistogram poas.dat -G180/180/180 -L1.0 -W10 -R1800/2000/0/10 -B10a50:"Year":/1a2:"Number Eruptions":WSne -JX2.75i -V -P > hazard.ps

plots a histogram of the data contained in the file "poas.dat"

Nearly all of the GMT command arguments used to plot this histogram are the same as those we used previously. Note that the pshistogram command needs a data range and projection, just as we used before. Now the projection, -JX2.75i, indicates we are uses simple Cartesian coordinates, and the total width of the plot is 2.75 inches. The -W argument indicates that the eruptions are binned in 10 year increments, -L1.0 sets the line thickness on the histogram boxes. Also note that in addition to tick marks, labels are put on the axes using the -B argument. Within the -B argument, WSne, means that the axes will be labeled on the West and South (left and bottom) axes. If you execute the "pshistogram" command using the -V option, GMT will report the data range and total number. That is useful information in and of itself - both for your analysis and for making a nice looking plot.

Study the histogram. It looks like in a given decade, on the order of one to four eruptions are be expected, that is, the recurrence rate of volcanic eruptions is 1 - 4 events/10 yr. There are five decades in which no eruptions occurred and there is one decade (1970's) in which an unusual number of eruptions (for Poas volcano) occurred. Also note that the number of volcanic eruptions per decade appears to increase with time. This may be "real" in the sense that more volcanic eruptions are happening more recently than happened in the past, or this trend may simply reflect improved reporting of volcanic eruptions. There is no way to tell from these data alone. Simkin and Siebert point out that this trend (increase in number of eruptions reported over time), is global, and easily attributable on a global scale to improved eruption reporting.

Therefore the "event" plotted on the histogram may not be the number of eruptions that actually occurred at Poas volcano, but the number of eruptions reported to have occurred at Poas volcano. Ideally, there would be no difference in these numbers. Unfortunately, there is clearly a differnce in this case, and often in any hazard assessment. One way to combat this issue is to clearly define the "event" that is being forecast, and to provide some sense of the validity and usefulness of this definition. Clarity is badly lacking in many hazard assessments because the effort is not made to clearly define the event that is forecast. Insidiously enough, it turns out that the fewer events reported, the more the definition of what is and is not an event is debated.

A second way to view the data is to plot cumulative number of events on the histogram, rther than th number of events per decade. GMT also provides an easy way to plot cumulative histograms:

pshistogram poas.dat -G180/180/180 -L1.0 -W10 -Q -R1800/2000/0/50 -B10a50:"Year":/10a10:"Cumulative Eruptions":WSne -JX2.75i -P -V > hazard2.ps

executing this command results in the following plot:

The only important change in the GMT command is that the -Q argument was added to create a cumulative histogram. Also note that the extent of the Y-axis has changed and a few other arguments modified to improve the plot.

The cumulative histogram simple adds all previous eruptions to those that occur in a given decade; in a sense the plot is simply a running total of eruptions. Although the same data are plotted, new salient information is conveyed by the cumulative histogram. On the cumulative histogram, it is clear that the frequency of eruptions was much less prior to 1870, with a clear increase after 1870. This seems to indicate much improved reporting starting in the latest part of the nineteenth century. The recurrence rate of eruptions looks fairly constant during the next one hundred years, until the 1970's. Once again, in the 1970's there appears to be a fairly dramatic increase in activity - this time not easily attributed to improved reporting. Perhaps this indicates a change in the geophyscial system - for example an increase in the magma production rate - during that last third of the twentieth century. A good plot can raise these questions - although it cannot always answer them!

Average Reucrrence Rate

The recurrence rate of events (e.g., reported volcanic eruptions at Poas volcano) is estimated directly from these data. We know (from looking at the data file) that 41 events occurred at Poas, the first in 1828 and the last in 1992. The average recurrence rate is calculated by

Average recurrence rate = (number of events - 1)/(time of last event - time of first event)

For example, taking all of the data at Poas: (41-1)/(1992-1828) = 0.24 events per year

But what about the "increase" in eruption frequency after about 1870? A total of 36 eruptions took place between 1880 and 1992 - yielding an average recurrence rate of 0.32 events/year. With 16 eruptions between 1970 and 1992, the recurrence rate was 0.68 during that period - considerably higher.

Between 1992 and 2001, only one eruption at Poas has been reported, in 1994. Perhaps there has been a real cycle of activity at Poas, with activity waxing in the 1970's-1980's and then waning in the mid to late 1990's. Such episodes appear to be common at stratovolcanoes, and certainly are seen in the geologic record of many large calderas. Thus average recurrence rate, while useful, is not sensitive to some significant variations in rates of volcanic activity.



Hazard Curves and Exceedence Probability

There is uncertainty in any hazard analysis. One of the problems that arises in using average recurrence rate estimates is that this uncertainty is not clearly presented. How can all of the data be used more effectively to forecast volcanic eruptions? Uncertainty is better captured by the analysis if the probability of an event exceeding some value is calculated. This exceedence probability can be shown graphically, providing another view of the total variation in the data set. Exceedence probability is often shown graphically as a complementary cumulative distribution function (CCDF), also known as a hazard curve.

We will use exceedence probability and hazard curves in a variety of contexts. In this case, we will calculate the probability that the repose period (interval between volcanic eruptions) exceeds some duration. This is calculated directly from the eruption data (poas.dat). Note: be sure to open the *.dat file with a text editor, sometimes other applications are the "default".

The first step is to calculate the interval between volcanic eruptions, for each eruption (N-1) in the data set. Basically all we have to do is difference successive years of volcanic eruptions to calculate this interval. This difference can be calculated by hand, but a much more efficient approach is to ask the computer to do the calculation for you.

To do this simple calculation, we need a new scripting language that will do the work. Many tools are available. For example the computer languages "C" or "Fortran" could be used to write a program to do the calculation. In UNIX (and some other operating systems) there are simpler scripting languages that are very well suited for this task. The scripting language we will use is called "perl". Perl stands for "practical extraction and report language". Perl can do an awful lot of complex computer tasks - for example it can be used to automate data collection over the internet very efficiently. We will primarily use perl to manipulate data tests, to make it easier to analyze them and visualize them using GMT and related software.

The file "perl_one.pl " is a perl script located in your directory. List this file usung the "more" command or open it in xemacs: xemacs perl_one.pl &. There are some features of this script that are common with the scripts you have already used and written. The first line of a perl script is always:

#!/usr/bin/perl

to let the computer know that the script contains perl commands that will be executed. Also, comment lines are still designated by the "#" character. The comment lines in this script indicate how the script will execute to do the task at hand. Look through this script to get a sense of the order of execution. In particular, note that this script only operates on the file "poas.dat".

To "run" this script, type the following on the command line:

perl perl_one.pl

if this command successfully runs, you will see a set of numbers print on the screen. These are the intervals (in years) between successive eruptions

If "Permssion Denied" appears instead, you have to make sure the script can be executed. Change the permissions on the script using the "chmod" command: chmod a+x perl_one.pl

The script will not work if it cannot find the file (poas.dat). Consequently, an error message will print if the file is not found, or could not be opened. One reason the file cannot be opened is that it does not exist in the directory. Another possibility is that the file in read protected. Check the file permissions and change them using the "chmod" command if the file is located in the directory, but the script still cannot open the file.

Now re-run the script, this time redirecting the output to a file:

perl perl_one.pl> poas_interval.dat

This dumps the output from the script into a new file called "poas_interval.dat". Try making a histogram using the data in this new file and the GMT commands.

You will not "have" to learn perl for this course, but I encourage you to get an idea of what it is all about. Try a few tutorials available on the web if you are so inclined - search on "perl + tutorial" and lots of information comes up.

The script perl_one.perl does not get us all the way to an exceedence probability plot. We now have long and short recurrence intervals, but what is the proportion of long to short recurrence intervals? The perl script "sort.perl" actually calculates this proportion, in addition to calculating the interval. List perl.sort to get a sense of the differences between this script and the first perl script (perl_one.pl).

You can run the script "sort.pl" by typing:

perl sort.pl poas.dat > hazard.dat

on the command line. Two columns of data are now output to the file called "hazard.dat". The first column is the eruption interval data, just as before, but now sorted in ascending order. The second column shows a proportion, ranging from 1 to slightly more than 0. The following is a graph of these data created using the following GMT command:

psxy hazard.dat -R0.50/30/0.01/1 -JX3il -W5.0 -Ba1f3p:"Quiescence (yr)":/a1f3p:"Exceedence Prob":WSne -V -P> hazard_curve.ps

The GMT command "psxy" is used to plot the data in the file "hazard.dat", or to make any x-y plot of data stored as two columns in a file. The range is given, projection, and annotation commands, just like in previous plots. The argument -JX3il indicates that a logrithm (l) will be plotted and that the entire plot will be 3 inches wide. Also note the format of the axis label arguement (-B) to get nice looking axes to appear. If you have already created the file "hazard.dat" using the perl script, you can now generate the hazard curve and view it using ghostview or xv.

The graph is extremely useful for evaluating the frequency of past eruptions and is a guide to the likelihood of future volcanic eruptions. For example, you can read from the graph that the the probability that the the interval between eruptions exceeds three years in duration at Poas volcano is about 50%; the probability that the interval between volcanic eruptions is greater than 10 years is about 10%. If we take the inverse of the repose period between eruptions, we get a recurrence rate:

3 years/1 eruption = 0.33 events (eruptions) / year.

10 years/1 eruption = 0.1 events (eruptions) / year.

Therefore, from the hazard curve we know that there is a 50% chance (the mean) that the recurrence rate of reported volcanic activity at Poas is lower than 0.33 events/year, in excellent agreement with the calculations we derived from the annual eruption data earlier. Furthermore, we know that there is only a 10% chance that the recurrence rate is lower than 0.1 events/year. The hazard curve is a very useful mechanism for evaluating the uncertainty in the recurrence rate of volcanism, earthquakes, and similar data.



A Single Script for the Entire Analysis

The script script4.gmt puts all of the ideas presented in this and previous lessons into single package. List the contents of the script to get a sense of how the graphs are generated (using the same commands already outlined). Note that several graphs are added to the same postscript file using the -O and -K arguments. Also note that the graphs are offset using the -X and -Y commands, so they appear neatly on the page. It is a nice feature of unix that the perl script can execute as part of this larger script file. You can generate all of the plots, and manipulate data as needed, by running this single script. The output of the script, when you run it, looks like:



Assignment 4

Copy "script4.gmt" to a new file that you can edit. Choose another volcano from the "volcanoes of the world" and create a data file of eruptions for that volcano. Make sure there are a reasonable number (e.g., > 30) eruptions in the data base for the volcano you chose. Modify your new script file as needed and generate the location map and plots (including hazard curve) for the volcano you have chosen. Save the output in the form of a jpeg in your directory. Name the file "assign4.jpg". Write a very brief description of the hazard rates for the volcano, and compare these rates to Poas volcano. Save the description in a text file named assign4.txt. As usual, paste the results into your Word document.


Chuck Connor
Last modified: Thu Jan 12 10:59:42 EST 2006