Lecture 2 - Data Models for GIS

Home Syllabus Schedule Lectures Labs Students Abstracts

Data Model Definition: an abstraction of real world entities and their relationships into structures that can be implemented with a computer language.

OUTLINE

I. Introduction

II. Data models

  1. External
  2. Conceptual
  3. Logical
  4. Internal

III. Data Management System

  1. Definition
  2. Requirements for a DBMS
  3. Terminology
  4. Entity-Relationship conceptual model
  5. Hierarchical logical model
  6. Network logical model
  7. Relational logical model
  8. Integration of DBMS with spatial data models

IV. Data Models for Spatial Data

  1. Introduction
  2. Raster conceptual data model
  3. Vector conceptual data model
  4. Object-oriented data model

I. Introduction

  1. Positions: referencing of an entity to a coordinate system (i.e. UTM, state plane ... etc).
  2. Data description:
    1. data as entities: geographic data often described in phenomenological concepts such as roads, towns, rives floodplains, eoctypes, soil associations, ... etc. These concepts are often referred to as entities.
    2. entity hierarchy: Data is often hierarchical in form (i.e. country > state > county > village; forest > deciduous/coniferous > upland/lowland)
    3. gradients between entities: separations between some entities are not always clear cut and there may be a transitional zone between entities (ecotones).
    4. Geographic data can be represented using three basic topological concepts, the point, the line and the area. Every geographical phenomena can in principle be represented by these three concepts plus a label or attribute that defines it.
    5. What is a map? "A map is a set of points, lines and areas that are defined by their spatial location with respect to a coordinate system and by their non-spatial attributes" (Burrough 1986). A map legend links the non-spatial attributes to the spatial attributes.
  3. Data concepts:
    1. a record is a data structure containing information about an entity that can be manipulated as a unit.
    2. a pointer is a memory address that references the start of data in the RAM
  4. Structuring data
    1. simple lists: unstructured data, each item is placed at the end of a list, search time (n + 1)/2.
    2. ordered sequential files: additions placed in proper position (insertion). Binary search is possible reducing search time log2(n+1). Item found in set of 65,535 + 1 items in 16 tries.
    3. indexed files:
    1. direct: Rapid data retrieval according to key attribute (i.e. dictionary spelling. Key attribute + additional information). In direct files the data items themselves provides means of ordering (soil series name with index to location of each name beginning with a particular letter)
    2. indirect: May have ordered soil profiles but may want info on soil depth, drainage, ph, texture or erosion. If the poorly drained soils need to be identified we must use a linear search unless we invert the file. Inverted files are initially ordered using a linear search. An example of an inverted file is a topic index in a book.
      1. limitations of simple structures: file modification is difficult, a new record is added to the end of a file, then the index is updated. Data can be accessed only via a key contained in the indexed file, while other types of information requires a sequential search

    These data structures provide very efficient access to information pertaining to a single entity. But we need more. We need to relate different entities.


II. Data Models (Laurini and Thompson, 1992 and ANSI/X3/SPARC, 1978)

As data management became more complex a framework was need to understand the transformation of real world systems and processes into structures that could be implemented in a computer.

1. External model: provide the basis for understanding the real world (e.g. non-spatial: a set entities; spatial: the world as a constantly varying surface; the world as a discrete set of objects in space or as a set of thematic layers)

2. Conceptual data model: provide the organizing principles that translates the external data models into functional descriptions of how data objects are related to one another (e.g. non-spatial: E-R model; spatial: raster, vector, object representation).

3. Logical data model: provide the explicit forms that the conceptual models can take and is the first step in computing (e.g. non-spatial: hierarchical, network, relational; spatial: 2-d matrix, map file, location list, point dictionary, arc/nodes).

4. Internal data model: low level data structures, records, pointers, etc.


III. Data Base Management Systems (DBMS)

1- Definition: Data Base Management Systems: A system used to organize, access, maintain and manipulate object or entity data. A DBMS controls input, output, storage and retrieval of entity data. Essential features of a data base are fast access and cross referencing of entities.

2- Requirements for a DBMS

A DBMS should provide:

  1. Data Independence: the data base can change with little or no impact on the user programs
  2. Data Sharing: must have coordinated simultaneous access. Concurrency control mechanism.
  3. Maintenance of Data Integrity: DBMS helps enforce certain consistency constraints (i.e. coordinate has both lat and long, # of seats sold on an airplane <= # seats on plane)
  4. Security: DBMS provides mechanism for security/authorization from disclosure/destruction of data.
  5. Centrality of Control: DB administrator to resolve conflicts and meet user requirements
  6. Reduce Application Development Time

3- E-R data model: a conceptual data model in which information is represented by entities and relationships between entities

4- Terminology

a. entity - a distinguishable object in the real world (people, forest stand, watershed, ...etc.)
b. relationship - a correspondence or association between two or more entities.
c. attributes - the properties which describe an entity.
d. functionality - how many entities from one entity set can be associated with another set
e. primary key - main key for entity identification, one record per indexed attribute.
f. secondary key - may have multiple record occurrences per index attribute.

5- Hierarchical data model - one to many relationship.

+ easy to update and expand.
+ easy data access for keys.
+ ideal for data that is inherently hierarchical.
- poor access for associated attributes.
- Restrictive paths.
- one to many relationships

6- Network data model - many to many relationships.

+ reduces redundancy.
+ more flexible paths to data.
+ very fast
- pointers expensive and difficult to update when inserting and deleting.

7- Relational data model: data stored as records known as tuples grouped together in two-dimensional tables known as relations. Whereas hierarchical structures rely on the hierarchy and networks depend on pointers to associate entities, the relational model uses data redundancy in the form of unique keys that identify records in each file. Simplifies data maintenance because data for an entity type is stored in simple tables. Relational joins are used to cross reference entities using a primary key in one table and a foreign key in another table. Thus, in order to perform relational joins there needs to be at least one column in common between tables being related.

 

The relational model is design to reduce redundancy of data whenever possible. A set of rules called the normal forms were developed by Codd (1970) to guide this process.

+ structures very flexible.
+ boolean logic and math operations.
+ insert and delete easy.
- often use sequential search unless previously sorted.

8- Integration of DBMS with the spatial data models

  1. raster
  2. vector

IV. Data Models for Spatial Data

Data structures are complex for GIS because they must include information pertaining to entities with respect to: position, topological relationships, and attribute information. It is the topologic and spatial aspects of GIS that distinguish it from other types of data bases.

1.  Introduction: There are presently three types of representations for geographic data: raster vector, and objects.

  1. raster - set of cells on a grid that represents an entity (entity --> symbol/color --> cells).
  2. vector -an entity is represented by nodes and their connecting arc or line segment (entity --> points, lines or areas --> connectivity)
  3. object – an entity is represented by an object which has as one of its attributes spatial information.

2.  Raster Data model

Definition: realization of the external model which sees the world as a continuously varying surface (field) through the use of 2-D Cartesian arrays forming sets of thematic layers. Space is discretized into a set of connected two dimensional units called a tessellation.

  1. Map overlays: separate set of Cartesian arrays or "overlays" for each entity.
  2. Logical data models: 2-D array, vertical array, and Map file
    1. Each overlay is a 2-D matrix of points carrying the value of a single attribute.
    2. Each point is represented by a vertical array in which each array position carries a value of the attribute associated with the overlay.
    3. Map file - each mapping unit has the coordinates for cell in which it occurs (greater structure, many to one relationship).
  3. Compact methods for coding

    Vertical array not conducive to compact data coding because it references different entities in sequence and it lacks many to one relationship. The third structure references a set of points for a region (or mapping unit) and allows for compaction.

  1. Chain codes: a region is defined in terms of origin and (0 - 3) for E, N, W, S (Map file) (binary data).
  2. + reduced storage.
    + area, perimeter, shape est.
    - overlay difficult.

  3. Run-length codes: row #, begin, end (Map file entity #, # pix (2-D matrix).
  4. + reduce storage.
    - overlay difficult.

  5. Block codes: 2-D rle, regions stored using origin and radius.
  6. + reduced storage.
    + U & I of regions easy.

  7. Quadtrees: recursive decomposition of a 2-D array into quads until the next subdivision yields a region containing a single entity.

+ reduced storage.
+ variable resolution.
+ overlay of variable resolution data.
+ fast search.

Morton Sequencing

Morton Sequencing Overlay

Morton Homework

3.  Vector data model

Definition: realization of the discrete model of real world using structures for storing and relating points, lines and polygons in sets of thematic layers.

a. Introduction

    1. represents an entity as exact as possible.
    2. coordinate space continuous (not quantized like raster).
    3. Structured as a set of thematic layers

b. Representation

    1. Point entities: geographic entities that are positioned by a single x,y coordinate. (historic site, wells, rare flora. The data record consists for x,y - attribute.
    2. Line Entity: (rivers, roads, rail)
    1. all linear feature are made up of line segments.
    2. a simple line 2 (x,y) coordinates.
    3. an arc or chain or string is a set of n (x,y) coordinate pairs that describe a continuous line. The shorter the line segments the closer the chain will approximate a continuous curve. Data record n(x,y).
    4. a line network gives information about connectivity between line segments in the form of pointers or relations contained in the data structure. Often build into nodes pointers to define connections and angles indicating orientation of connections (fully defines topology).

    3.  Area Entity: data structures for storing regions. Data types, land cover, soils, geology, land tenure, census tract, etc.

    1. Cartographic spaghetti or "connect the dots". Early development in automated cartography, a substitute for mechanical drawing. Numerical storage, spatial structure evident only after plotting, not in file.
    2. Location list
    • describe each entity by specifying coordinates around its perimeter.
    • shared lines between polygons.
    • polygon sliver problems.
    • no topology (neighbor and island problems).
    • error checking a problem.

   c.  Point dictionary

   d.  Dime Files (Dual Independent Mapping and Encoding)

e. Arc/node

 

4. Object-oriented data model

Definition: realization of the discrete model of real world using an object centered approach in which an object has both physical (attribute) and geometric characteristics. Different types of objects can interact because they are not confined to separate layers.

The biggest single difference between the object-oriented conceptual model and the vector-layered based conceptual model, for representing geographic information, is that in the object model, the real world object is the basis for abstraction, not its geometry. In other words, the objects not the geometric components of layers are the "units" for modeling and interactions