TEACHING GEOG 385.02/GTECH 785.02 |
GIS SG course homepage |
LECTURE. MAP ALGEBRA AND STATISTICS
By the end of this lecture you should be able to:
MAP ALGEBRA
Used for generating new information from an available data set based on knowledge of relationships and processes. New maps can be produced using various algebraic formulas or algorithms.
3 Types of mathematical operations
SCALAR
Performs a mathematical operation between an image and some value (constant).
Scalar takes every cell in original image and adds, subtracts, multiplies, divides, or exponentiates the pixels by a constant value.
Example: In order to convert elevation
values in feet to meters, we need to multiply each cell value by 0.3048: ft x
.3048 = m
Example: Noise pollution is a function of distance from a facility. The
impact N at each location can be estimated using distance values and a
constant: N=K*D
TRANSFORM
Transform allows performing functional transformation of the image values. Instead of manipulating a constant value, we apply a function.
Example: Magnitude of changes can be measured and mapped using ABS(time1 – time2). For example, population density changes between the two dates. Absolute value is the function. Other functions include log, natural log, reciprocal, tan, arctan, etc.
OVERLAY
Performs a mathematical operation
between two images.
Example: Moisture Balance = Rainfall –
Evaporation.
Example: Population density = Population/Area
Comparing SCALAR, TRANSFOR, and OVERLAY:
Combining different operations
Some mathematical models require that many operations be performed in a row. Images might be combined through several SCALAR or OVERLAY commands.
Example: converting a temperature map from Fahrenheit to Celsius units involves two SCALAR operations:
TempC = (TempF - 32)/1.8
Example: Converting distance measurements in miles to time measurements in munutes if the speed of movement is 5 km per hour (walking speed) involves three SCALAR operations: to convert miles into km, multiply each cell by 1.61; to convert distance to time, divide each cell by speed of walking 5 km/h; and to convert hours to munites, multiply the result by 60:
TimH=(DistM*1.61)/5*60
Example: In the USLE (Universal Soil Loss Equation) we have A = K*C*R*LS*P, where
A = soil loss in metric tons per hectare per year
K = soil erodibility factor
C = vegetative cover factor
R = rainfall and run-off factor
LS = slope and slope length factor
P = conservation practices factor
In IDRISI we can use IMAGE CALCULATOR for formulas that have several SCALAR, OVERLAY, or TRANSFORM operations.
IMAGE CALCULATOR can, in fact, be used to replace a variety of the tools we’ve used so far in class.
STATISTICS
Statistical measures are important in a GIS, because they allow for more formal analysis of quantitative and qualitative data.
Descriptive statistics:
Very common measures include:
In Idrisi the modules HISTO, EXTRACT, and PROFILE provide descriptive statistics.
Example of population density data where SD provides a measure of the range of the data (different from just knowing the min and max values). Using SD assumes that the data is described by a normal (bell shape) curve.
Mean value is 49.
Minimum is 18 and Maximum is 105.
1SD is a distance of 7 from the mean in either
direction.
1SD describes the interval 42 to 56 (68% of all data is in this interval).
2 SD is 35 to 63 (95% of all data is in this interval).
3 SD is 28 to 70 (99% of all data is in this interval).
Inferential statistics
Let us estimate the relationship
or association between different variables and data sets (as images or values
files).
Different statistics are applied to qualitative and quantitative data.
For relationships in Qualitative data:
Here we are interested in the differences between categorical datasets (categorical images).
For example, change in landuse between two dates for the same territory (expressed as two landuse images).
One fundamental way to compare two categorical (qualitative) images is through a cross-tabulation of the categories in each image. They can also be compared visually through a cross-classification image. Both can be produced using the module CROSSTAB.
Crosstabulation results in a table showing the numbers of cells classified as belonging to the same category in each image. Cell totals are provided. In addition several statistical measures are output. These include:
Example: Crosstabulation table for the same area at two different dates.
Date 1 1990
|
|
Park |
Industrial |
Streets |
Residential |
Total |
|
Park |
2642 |
3 |
4 |
0 |
2849 |
Date 2 |
Industial |
1 |
31874 |
596 |
0 |
32471 |
2000 |
Streets |
2 |
1063 |
72487 |
23 |
73575 |
|
Residential |
0 |
8742 |
328 |
53221 |
62291 |
|
Total |
2845 |
41682 |
73415 |
53244 |
171186 |
· The diagonal values are those cells that did not change landuse type.
· Values outside diagonals show change by category. Note, that change from date1 to date2 is not the same as change from date2 to date1.
· For example: Industrial landuse declined from 1990 to 2000. Looking by column (1990 is referent date): Most became residential and streets, some to parks. Looking by row (2000 is referent): industrial also grew a little, mainly at the expense of streets and roads in other locations.
The KAPPA index of agreement is calculated from the table such that K = 1.0 means the two images are as correlated as possible (they are the same image); K = -1.0 means that the two images are as uncorrelated as possible; K = 0.0 means there is no correlation between the images.
KAPPA is calculated by
K = (
Where
From this is subtracted the proportion of change due to chance, Pe.
Pe = ((2845/171186)*(2849/171186)) +
((41682/171186)*(32471/171186)) +
((73415/171186)*(73573/171186)) +
((53244/171186)*(62291/171186)).
Pe = ??
For this example, the Kappa index is high (.904).
Note that this number is smaller than the simple proportion of cells that did not change (.937).
Kappa indices can be calculated for each category as well as overall. Individual indices tell us which categories changed and which did not.
For this example, the per-category Kappa is as follows (using different dates as referent).
|
Date 1 Referent |
Date 2 Referent |
Park |
.9989 |
.9975 |
Industiral |
.7096 |
.9757 |
Streets |
.9778 |
.9741 |
Residential |
.9993 |
.7887 |
The change from date 1 to date 2 is most evident in the cropland and forest categories. With date 1 as referent we see that much of the cropland in date 1 has changed to something else. With date 2 as referent we see that much of the forest has changed to something else…
Cross-classification results in an image that shows all possible combinations of the categories from two images. Where they are different, new categories are defined.
For relationships in Quantitative data
For quantitative data we are interested in statistical procedures that reveal difference in degree or quantity usually from low to high. Regression analysis is the most common procedure.
Regression analysis mathematically describes the relationship between variables, which for GIS can be either two images or two non-image datasets (e.g. values files).
For example, there is a relationship between distance from a subway and average housing rents. In general, with an increase in distance, there is a drop in mean housing rent values. Given many sample sites or locations, these two datasets (distance from the subway stop and average rents) can be plotted together.
X – is always the independent variable (the
one that "causes" the other - in this case distance).
Y – is the dependent variable (the one that "depends on" the
other - in this case housing rents).
A trend line is produced that best fits the data. This line is described
by the equation:
Y = a + bX (where b = slope and a = the Y intercept).
Once the slope and Y intercept are known, any value for X can then yield an expected value for Y. For example, any value for distance can be "plugged into" the equation and an expected mean housing rent value can be calculated.
Importantly for GIS, entire images can be "plugged into" the equation using map algebra tools. An image of distance can be transformed into an image of expected average rent value.
This simple regression is a linear regression. It assumes a linear relationship between the independent and dependent variables (the trend line is assumed to be straight).
The relationship can be positive or negative. For example:
o Distance from subway and housing rents (negative relationship, a downward sloping trend line).
o Income and value of automobile (positive).
Correlation
Relationships aren’t always perfect. The degree to which a relationship between two variables is consistent is quantified by the statistics R and R2 .
The statistic R estimates the strength of the relationship. A perfect relationship will yield R = -1 for a negative relationship and R = 1 for a positive relationship.
If R = 0.8, then R2 = 0.64 and this is the coefficient of determination of one variable by another. It is expressed as a percent.
In IDRISI the module REGRESS allows for a simple linear regression of values files or images. [One should be careful with images, because regression assumes random sampling of independent variables. But in images, neighborliness of cells can influence their values.]
MULTIREG – multiple regression
LOGITREG – logit regression
Draws a spatial sample of points from the image: random, systematic, stratified random.
See other modules under Analysis/ Statistics
and Analysis/ Surface analysis/ Geostatistics