Lab 08

Introduction to [R logo]

 

This lab exercise is supposed to help you familiarize yourself with the software environment that [R logo] is. By software environment rather than just calling it a program, we mean that [R logo] is both a typical application program such as MS Excel or ArcGIS, as well as a programming environment. With the introduction to [R logo], we almost sneakingly also introduce you to programming per se – although this is only a side effect.

We recommend that you have at least three program running concurrently: the web browser to read this set of instructions, [R logo], and a text editor such as Word, WordPad, or something similar. There are no subdivisions to this lab as you may have gotten used to from the ArcGIS lab. Mostly, you are just supposed to follow the examples and play with variations of these.

Estimated time to complete this lab: 90 minutes

 

Step 1  Start [R logo]

If this is the first time you are using [R logo], go to the Windows Start menu, All Programs, R and elect [R logo] R 2.0.1. After a few seconds, you should see the following screen:

 

 

Step 1: The R start window.

 

To abbreviate this startup sequence, I recommend that you create a shortcut to [R logo] either on your desktop or your Quick Launch taskbar.

 

Step 2  Know how to get help

No, this is not about approaching Jing Li for help. [R logo] has a number of built-in ways to get help. One option is the Help menu in the top part of the [R logo] RGui window. Html help brings up the same html help pages as the command line

>  help.start()

You could also search the help pages for a particular topic through the menu item in the Help menu, or if you know the command you want to use but not all its parameters, you can learn about all the details by entering

>  help(command)

Now let’s look at some of the other menu in the RGui.

 

Step 3  Packages

As you recall from the lectures, [R logo] is a public domain package that consists of many modules, which here are called packages. By default, that is without you having to do anything, [R logo] starts with a base package, which as the name suggests contains a basic set of functionalities and some sample datasets. Also by default, there are many more packages already installed on your hard disk. You can see them by picking Load package… from the Packages menu.

 

 

Step 3: Packages already installed but not yet loaded into memory.

 

Step 4  Installing additional packages (you cannot do this on the lab computers!)

If you downloaded [R logo] from one of the Comprehensive R Archive Network (CRAN) sites to install it on your home computer (something, I, Jochen, highly recommend you to do), then you can also install additional packages that are not part of a standard installation. There are literally hundreds of add-on packages out there. One of the real benefits of [R logo] is that there are more spatial (geographic) functions in these public domain packages than in any commercial statistics software package.

 

 

Step 4: The first sixteen of now approximately 450 packages on the CRAN website.

 

As mentioned in the lectures, the reason for not just pre-installing and then loading all packages is that you would clog your computer’s memory with too much stuff that you don’t need and hence suffer severe performance loss. If you experiment with a number of packages and find that your computer becomes somewhat sluggish, then you might want to unload packages. You do this with the command

>  detach(package)

You can get rid off all packages except for the base package, which is essential for running the RGui in the first place.

 

Step 5  Managing data

In the course of an [R logo] session, you will create a number of data objects. That number is only limited by the size of your computer’s memory, but as we have seen in the previous step, you may hit this boundary sooner than expected. Under the Misc menu, you can list all objects currently occupying your memory, and if you wish get rid of them. It also lists the keyboard shortcut that helps you to cancel a computing intensive operation, the <ESC> key.

The Edit menu is very much like what you know from other Windows programs. It also lists the option to clear your console window.

More information I strongly recommend that you don’t do that because your console preserves the history of your complete R session. You will use your console output as proof for your lab work. It also helps Jing Li understand what went wrong when you get stuck during the lab exercises and you ask him for help.

We have talked a lot about memory during the last few steps. Program code (the packages), data (objects) and the session history are all stored in memory and together make up what is called your current workspace. You can dump (save) your complete workspace to disk, thereby allowing you to continue a session when you return some other time. That’s what is behind the Save Workspace and Load Workspace options in the File menu. Similarly, you can save and load the history of commands that you issue in a session. By the way, this history is also accessible through the arrow up á key, which is particularly helpful with long commands (having many parameters), where you made some small mistake. Using the á key, you can recall a previous command and edit the command.

Also under the File menu, you set the current working directory, which by default is the installation directory (which you are not allowed to write you in our Geography labs). As a matter of fact, that’s something that you should make a habit. Please either use the RGui File menu to change to a directory on your U:\ drive or do the equivalent by typing on the console

>  setwd(“U:/GTECH201/R”)   replace the string with a valid path to the folder of your choice)

Observe that [R logo] requires the path to be in quotation marks (every [R logo] object is either a variable name, a number, or a text string, the latter is marked by the enclosing quotation marks) and that the directories are separated by Unix-like forward slashes even in a Windows environment.

Finally, the Save to File… item of the File menu is what you will use for your lab submission. With this, you save the contents of the console window to a text file, which you will copy to your public_html directory and set a link to, as part of your lab submission.

But now let’s get started with [R logo] proper.

 

Step 6  Reading data

As mentioned in the lecture, data input is not really a strong point of [R logo]. There is a variety of functions for reading external data but they all amount to the use of data in ASCII format rather than convenient reading of native MS Excel or dBASE formats, not to mention GIS data or census data in SAS or SSS format.

Download and uncompress to a new directory on your U: drive the data for this lab. Once you unzipped that file, you should among others have a file ACTpop.txt in your R working directory. This file contains the population figures n thousands for the Australian Capital Territory at various times since 1917.

>  ACTpop = read.table(“ACTpop.txt”, header=TRUE)

This reads in the data, and stores them in the data frame ACTpop. Note the use of header=TRUE to ensure that [R logo] uses the first line to get header information for the columns. Type ACTpop at the command prompt,

>  ACTpop

and the contents of newly created [R logo] object is printed in the console window.

Case is significant for names of [R logo] objects or commands. Thus ACTPOP is different from ACTpop.

You can now plot the ACT population between 1917 and 1997 by typing

>  plot(ACT ~ Year, data=ACTpop, type=’p’, col=”black”, lwd=5)

 

Step 6: ACT population, in thousands, at various times between 1917 and 1997.

 

ACT ~ Year is a graphics formula. Read “Plot ACT (on the y-axis) against Year (on the x-axis).” The remaining options determine plot type (here points), color, and size.

More information Right-click inside the graphics window. Observe that you can save the graphics to disk, or alternatively copy it as a Windows metafile to the clipboard (the place reserved in MS Windows memory for your copy and paste operations. You will need to do this for your lab submission.

We can use data.frame() to input data directly at the command line. The following assigns measurements from an experiment, where by each amount by which an elastic band is stretched over the end of a ruler, the distance that the band traveled upon release has been recorded.

 

>  elasticband = data.frame(stretch=c(46,54,48,50,44,42,52), distance=c(148,182,173,166,109,141,166))

The constructs c(46,54,48,50,44,42,52) and c(148,182,173,166,109,141,166) concatenate or join the separate numbers into a single vector object.

 

Step 7  Working with in-built data

An important category of [R logo] object is the data frame. [R logo] uses data frames to store rectangular arrays in which the columns may be vectors of numbers or factors of text strings. Data frames are central to the way that [R logo] routines process data. Consider, for example, the data frame cars that is part of the base package. It has two columns with the names speed and dist. Typing summary(cars) gives summary information on these variables.

>  data(cars)   # Gives access to the data frame cars

>  summary(cars)

The # character is used to write comments; anything after a # sign is not interpreted by [R logo].

 

Step 8  Non-statistical data handling

There are three common ways to extract subsets of vectors.

1.       Specify the indices of the elements that are to be extracted, e.g.

>  x = c(3,11,8,15,12)   # assign five values to vector x

>  x[c(2,4)]     #extract elements in positions 2 and 4 only

2.       Use negative subscripts to omit the elements

>  x[-c(2,3)]     # remove the elements in position 2 and 3

3.       Specify a vector of logical values. The elements that are extracted are those for which the logical value is TRUE.

>  x > 10

[1]   FALSE TRUE FALSE TRUE TRUE

>  x[x > 10]

[1]   11 15 12

Another way of accessing subsets is to look at data frames from a column or variable perspective. The names() function can be used to determine variable names. As an example, consider the New York air quality data set that comes with the base package. To determine the variable names in this data frame, type

>  data(airquality)

>  names(airquality)

[1]  Ozone”   Solar.R   “Wind”   “Temp”   “Month”   “Day”

The sapply() function is a useful tool for calculating statistics for each column of a data frame. The first argument to sapply() is a data frame. The second is the name of a function that is to be applied to each column. Consider the women data frame that also is part of the base package:

>  data(women)

>  women     # display the data

>  sapply(women,mean)

[R logo] borrows many concepts from Unix scripting languages. One is to abbreviate repetitive procedures by identifying a pattern and then instruct [R logo] to repeat the pattern n times. The colon : for instance, is used to mark the first and last element of a list (similar to marking ranges in MS Excel).

>  5:15

[1]  5  6  7  8  9  10  11  12  13  14  15

This, by the way, works also the other way around: 15:5 will generate the sequence in reverse order. The colon : is a special form of a more general function seq().

>  seq(from=5, to=22, by=3)     # the first value is 5, the last value is <= 22

[1]  5  8  11  14  17  20

And to repeat the sequence (2,3,5) four times over, enter

>  rep(c(2,3,5), 4)

[1]  2  3  5  2  3  5  2  3  5  2  3  5

With this, we came to the end of the introduction to [R logo]. The eight steps above provide you with all the information you need to now perform the following six tasks, which make up your lab submission. Start Frontpage Express to create a new web page.

Question 1   Use the data frame elasticband  from step 6 to plot distance  against stretch. Copy the graphics and paste it into your web page in Frontpage Express.

Question 2   The aranda.txt dataset gives the size of the floor area (in ha) and the price (in $000) for 15 houses in a local suburb in 1999.
(a) Plot sale.price versus area
(b) Use the histogram parameter to plot a histogram of the sale prices.
Copy both graphs to your web page.

Question 3   The orings.txt data frame gives data on the damage that had occurred in US space shuttle launches prior to the disastrous Challenger launch of 01/28/86. Only the observations in rows 1, 2, 11, 13, and 18 were included in the pre-launch charts used in deciding whether to proceed with the launch.
Create a new data frame by extracting these rows from orings.txt and plot Total incidents against Temperature for this new data frame. Obtain a similar plot for the full dataset. Copy both graphs to your web page.

Question 4   Import the lakes.txt dataset containing the elevation of lakes in Manitoba (in meters above sea level) and their area (in square kilometers).
Assign the names of the lakes using the row.names function. The following is a list of the lake names: Winnipeg, Winnipegosis, Manitoba, SouthernIndian, Cedar, Island, Gods, Cross, Playgreen.
Then obtain a plot of lake area against elevation and copy the graph to your web page.

Question 5   Using the sum() function, obtain a lower bound for the area of Manitoba covered by water. Record your answer in your web page.

Question 6   The ^ symbol denotes exponentiation. Consider the following:
> 1000*((1+0.075)^5 – 1)     # interest on $1,000, compounded annually
                                             # at 7.5% per annum for five years
(a) Type the above expression
(b) Modify the expression to determine the amount of interest paid of the rate is 3.5%
(c) What happened is the exponent 5 is replaced by seq(1, 10)?
Record your observation in your web page.

Rename your web page lab08.answers.html and set a link to this page from your home page. Then send an email to Jing Li announcing your lab submission and providing him with the URL to your lab answers.