Map Algebra and Statistics Lecture

TEACHING GEOG 385.02/GTECH 785.02
GIS APPLICATIONS IN SOCIAL GEOGRAPHY
Back to MP home page

GIS SG course homepage
Back to GIS SG schedule
GIS links and resources
Geography resources

LECTURE. MAP ALGEBRA AND STATISTICS

By the end of this lecture you should be able to:

Explain what is meant by "Map Algebra."
Describe and contrast the mathematical operations possible using the Idrisi modules SCALAR, OVERLAY, TRANSFORM, and Image Calculator.
Evaluate algebraic expressions using images as variables.
Describe the difference between descriptive and inferential statistics, and identify the basic tools in Idrisi used for each.
Describe a fundamental method for comparing two qualitative data sets.
Describe a fundamental method for comparing two quantitative data sets.

MAP ALGEBRA

Used for generating new information from an available data set based on knowledge of relationships and processes. New maps can be produced using various algebraic formulas or algorithms.

A map (raster image) acts as a variable in a mathematical equation.
The same equation is applied to each cell in the image.
Each cell is modified according to this equation and the outcome will depend on the cell’s initial value.
Map algebra is a cell by cell transformation of images.

3 Types of mathematical operations

SCALAR

Performs a mathematical operation between an image and some value (constant).

Scalar takes every cell in original image and adds, subtracts, multiplies, divides, or exponentiates the pixels by a constant value.

Example: In order to convert elevation values in feet to meters, we need to multiply each cell value by 0.3048: ft x .3048 = m
Example: Noise pollution is a function of distance from a facility. The impact N at each location can be estimated using distance values and a constant: N=K*D

TRANSFORM

Transform allows performing functional transformation of the image values. Instead of manipulating a constant value, we apply a function.

Example: Magnitude of changes can be measured and mapped using ABS(time1 – time2). For example, population density changes between the two dates. Absolute value is the function. Other functions include log, natural log, reciprocal, tan, arctan, etc.

OVERLAY

Performs a mathematical operation between two images. All operations are implemented cell by cell.

Example: Moisture Balance = Rainfall – Evaporation.
Example: Population density = Population/Area

Comparing SCALAR, TRANSFOR, and OVERLAY:

SCALAR performs a mathematical operation between ONE image and some numeric value (constant).
TRANSFOR performs a functional transformation of ONE image.
OVERLAY performs a mathematical operation between TWO images.

Combining different operations

Some mathematical models require that many operations be performed in a row. Images might be combined through several SCALAR or OVERLAY commands.

Example: converting a temperature map from Fahrenheit to Celsius units involves two SCALAR operations:

TempC = (TempF - 32)/1.8

Example: Converting distance measurements in miles to time measurements in munutes if the speed of movement is 5 km per hour (walking speed) involves three SCALAR operations: to convert miles into km, multiply each cell by 1.61; to convert distance to time, divide each cell by speed of walking 5 km/h; and to convert hours to munites, multiply the result by 60:

TimH=(DistM*1.61)/5*60

Example: In the USLE (Universal Soil Loss Equation) we have A = K*C*R*LS*P, where

A = soil loss in metric tons per hectare per year
K = soil erodibility factor
C = vegetative cover factor
R = rainfall and run-off factor
LS = slope and slope length factor
P = conservation practices factor

In IDRISI we can use IMAGE CALCULATOR for formulas that have several SCALAR, OVERLAY, or TRANSFORM operations.

IMAGE CALCULATOR can, in fact, be used to replace a variety of the tools we’ve used so far in class.

STATISTICS

Statistical measures are important in a GIS, because they allow for more formal analysis of quantitative and qualitative data.

Descriptive statistics:

Allow access to the nature of the dataset and the distribution or frequency of values.
Very common measures include:

Average or mean value for a dataset (image).
Standard deviation values in a dataset (image).
Frequency distributions of values in a dataset (image).

In Idrisi the modules HISTO, EXTRACT, and PROFILE provide descriptive statistics.

Example of population density data where SD provides a measure of the range of the data (different from just knowing the min and max values). Using SD assumes that the data is described by a normal (bell shape) curve.

Mean value is 49.
Minimum is 18 and Maximum is 105.

1SD is a distance of 7 from the mean in either direction.
1SD describes the interval 42 to 56 (68% of all data is in this interval).
2 SD is 35 to 63 (95% of all data is in this interval).
3 SD is 28 to 70 (99% of all data is in this interval).

Inferential statistics

Let us estimate the relationship or association between different variables and data sets (as images or values files).
Different statistics are applied to qualitative and quantitative data.

For relationships in Qualitative data:

Here we are interested in the differences between categorical datasets (categorical images).

For example, change in landuse between two dates for the same territory (expressed as two landuse images).

One fundamental way to compare two categorical (qualitative) images is through a cross-tabulation of the categories in each image. They can also be compared visually through a cross-classification image. Both can be produced using the module CROSSTAB.

Crosstabulation results in a table showing the numbers of cells classified as belonging to the same category in each image. Cell totals are provided. In addition several statistical measures are output. These include:

CHI squared -- estimates the likelihood that a relationship between the two variables (the landuse75 and landuse80 images) exists.
Cramer’s V -- estimates the strength of the association (0-1).
K (Kappa) index – an index of the agreement between two images. This ranges from 0 to +/-1 where 1 = full agreement (images are identical, no change), -1 = full disagreement (the images are opposite, complete transformation in a consistent manner), 0 - no correlation (change is random).

Example: Crosstabulation table for the same area at two different dates.

Date 1 1990

		Park	Industrial	Streets	Residential	Total
	Park	2642	3	4	0	2849
Date 2	Industial	1	31874	596	0	32471
2000	Streets	2	1063	72487	23	73575
	Residential	0	8742	328	53221	62291
	Total	2845	41682	73415	53244	171186

· The diagonal values are those cells that did not change landuse type.

· Values outside diagonals show change by category. Note, that change from date1 to date2 is not the same as change from date2 to date1.

· For example: Industrial landuse declined from 1990 to 2000. Looking by column (1990 is referent date): Most became residential and streets, some to parks. Looking by row (2000 is referent): industrial also grew a little, mainly at the expense of streets and roads in other locations.

The KAPPA index of agreement is calculated from the table such that K = 1.0 means the two images are as correlated as possible (they are the same image); K = -1.0 means that the two images are as uncorrelated as possible; K = 0.0 means there is no correlation between the images.

KAPPA is calculated by

K = (Po – Pe)/(1- Pe)

Where Po is the proportion of cells not changed. Po is the sum of all the diagonal values (those that did not change) divided by the total number of cells.

Po = (2842+31874+72487+53221)/171186
Po = .937 (the proportion of cells that did not change).

From this is subtracted the proportion of change due to chance, Pe.

Pe = ((2845/171186)*(2849/171186)) +
((41682/171186)*(32471/171186)) +
((73415/171186)*(73573/171186)) +
((53244/171186)*(62291/171186)).

Pe = ??

For this example, the Kappa index is high (.904).

Note that this number is smaller than the simple proportion of cells that did not change (.937).

Kappa indices can be calculated for each category as well as overall. Individual indices tell us which categories changed and which did not.

For this example, the per-category Kappa is as follows (using different dates as referent).

	Date 1 Referent 1990	Date 2 Referent 2000
Park	.9989	.9975
Industiral	.7096	.9757
Streets	.9778	.9741
Residential	.9993	.7887

The change from date 1 to date 2 is most evident in the cropland and forest categories. With date 1 as referent we see that much of the cropland in date 1 has changed to something else. With date 2 as referent we see that much of the forest has changed to something else…

Cross-classification results in an image that shows all possible combinations of the categories from two images. Where they are different, new categories are defined.

For relationships in Quantitative data

For quantitative data we are interested in statistical procedures that reveal difference in degree or quantity usually from low to high. Regression analysis is the most common procedure.

Regression analysis mathematically describes the relationship between variables, which for GIS can be either two images or two non-image datasets (e.g. values files).

For example, there is a relationship between distance from a subway and average housing rents. In general, with an increase in distance, there is a drop in mean housing rent values. Given many sample sites or locations, these two datasets (distance from the subway stop and average rents) can be plotted together.

X – is always the independent variable (the one that "causes" the other - in this case distance).
Y – is the dependent variable (the one that "depends on" the other - in this case housing rents).
A trend line is produced that best fits the data. This line is described by the equation:

Y = a + bX (where b = slope and a = the Y intercept).

Once the slope and Y intercept are known, any value for X can then yield an expected value for Y. For example, any value for distance can be "plugged into" the equation and an expected mean housing rent value can be calculated.

Importantly for GIS, entire images can be "plugged into" the equation using map algebra tools. An image of distance can be transformed into an image of expected average rent value.

This simple regression is a linear regression. It assumes a linear relationship between the independent and dependent variables (the trend line is assumed to be straight).

The relationship can be positive or negative. For example:

o Distance from subway and housing rents (negative relationship, a downward sloping trend line).

o Income and value of automobile (positive).

Correlation

Relationships aren’t always perfect. The degree to which a relationship between two variables is consistent is quantified by the statistics R and R².

The statistic R estimates the strength of the relationship. A perfect relationship will yield R = -1 for a negative relationship and R = 1 for a positive relationship.

If R = 0.8, then R²= 0.64 and this is the coefficient of determination of one variable by another. It is expressed as a percent.

In IDRISI the module REGRESS allows for a simple linear regression of values files or images. [One should be careful with images, because regression assumes random sampling of independent variables. But in images, neighborliness of cells can influence their values.]

MULTIREG – multiple regression
LOGITREG – logit regression

SPATIAL SAMPLING PROCEEDURES

Draws a spatial sample of points from the image: random, systematic, stratified random.

See other modules under Analysis/ Statistics and Analysis/ Surface analysis/ Geostatistics