TEACHING GEOG 385.02/GTECH 785.02 
GIS APPLICATIONS IN SOCIAL GEOGRAPHY
Back to MP home page

GIS SG course homepage
Back to GIS SG schedule
GIS links and resources
Geography resources


LECTURE DEMO. GIS PROJECTS AND DATABASE DEVELOPMENT

DATABASE DEVELOPMENT

DATA COLLECTION

DATABASE INTEGRATION

Geographic database
Database development and objectives for GIS

Finding and importing existing digital data
Creating your own data

Georeferencing
Resolution
Accuracy

 

 

OTHER CONSIDERATIONS
Cost
Possibilities for analysis and consequent use


Major issues:

  • Differentiate 2 basic approaches to using GIS
  • Describe 3 ways to get data into your database.
  • Explain the importance of file and physical formats to database development.
  • Describe and give an example of surrogate data and how it is different than deriving data.
  • List and explain the major issues to consider when building a database.

GIS is a process. Data are integrated into the system that we call a GIS and analyzed in order to assist in solving questions that we are asking.
The hardware and software are only two aspects of the system as an integrated whole.
The other aspects of the system include the organization within which it operates, the people that operate it and the data that feed it.

DATABASE DEVELOPMENT

  • In this framework, “Database Development” is a key part of this process.
  • According to Clarke (Clarke, K.C. Getting Started with Geographic Information Systems. New Jersey: Prentice Hall. 1997. pp115.), database development “takes up anywhere between 60 and 90% of both the time and money spent on a typical GIS project.”
  • Therefore, understanding the complexities and difficulties of finding existing data, creating your own data, and then integrating these data into your GIS is an integral step in project management. These steps constitute stages in database development.
  • The way in which your geographic database is set up will determine how well your data will be retrieved, updated, integrated with other datasets, and analyzed.
  • You might spend a lot of time and effort to develop a database that will be unusable.

Geographic database

  • No matter how large and multilayed your database (spatial and attribute) is, it remains a very simplified model of the world.
  • This means that all world objects must be reduced to points, lines, polygons, and cells and represented as such in your database.
  • They are data units, and your attribute data must desribe spatial objects as represented by these geometric primitives.
  • Essentially, only data organized in this way will be suitable for conventional GIS display, querying, and analysis.
  • We will consider limitations of such approach later in this course.

Database development and objectives for GIS

Two approaches to DB development:

  • Data Inventory Approach
    • Large data collection projects
    • For multiple users and multiple uses
    • Often linked to tabular data (vector format with tables)
    • Examples: US Census Bureau maps and data, CARSI's NYC map
  • Project Approach
    • Question or problem driven
    • Informed by tools for analysis
    • Data collection limited to what is needed to solve a particular problem
    • Participants limited to individual or research team
    • Examples: research projects, projects for organizations, communities, etc.
  • Your class project?

Two major steps in database development: Data collection and Data integration.

Top of page


DATA COLLECTION

Types of data based on its availability: Existing data, non-existing data, surrogate data, derived data.

Finding and importing existing digital data

GIS databases have specific structure (spatial and attribute database in raster or vector format). Only such files will be read by your GIS package. Therefore, make sure that your data is in such format or can be converted to such format using export/import utilities of your GIS software.

Where to find it

  • State Agencies (e.g. USGS 1-800-USA-MAPS, NOAA, Bureau of Census, NYC Planning Department)
    • This data is public (you pay for its creation through your taxes). Therefore, it should be available to you free of charge or for a very low price (e.g., price of the media, etc.).
  • Companies, private GIS database providers (e.g. InfoTerra, SPOT, GDT)
    • Private data providers distribute data for profit. They might use the public data to create their own product. Usually, these datasets are sold commercially (designed for other private companies) and are expensive.
  • World Wide Web - a few places to start. Both public and private datasets can be located and downloaded from the web. See GIS resources page for some suggestions.
  • Discussion Lists of GIS users (GIS-L, IDRISI-L, ESRI-L). Usually, these are great communities - ask a question and you will get many replies.
  • Use data sets that already are in use in your institution.
    • Check out S:\scratch\digital data
    • Check out data sets (shape files) that come with ArcView (d:\ArcView\Esridata)
    • Ask your advisors, peers, and share data yourself!

Physical Media

  • Data comes on a variety of physical media (CDs, Floppy disks, Zip disks, tapes, etc.).
  • You have to have a READER that can read data files from the physical media you get. Your computer should have a physical drive to accept a specific media type as well as the software (driver) to communicate with this drive. Before you can do anything else, you need to get the files onto your harddisk (e.g., install drivers for reading digital tapes).
  • Electronic delivery, FTP, Web Download, usually no problem.
  • CD or Floppy also no problem, unless your machine can’t read either.

General Data formats

You want your data to be in

    • GIS Raster or Vector formats plus attribute table in a Database (*.DBF, *.MDB) or Spreadsheet formats (*.XSL).
    • Or files that can be translated into these formats using export/import utilities of GIS.

Digital File formats

  • Software-specific GIS formats (Idrisi raster or vector formats, ArcView Shape or grid files, MapInfo files, etc.)
  • Agency Formats, files produced by agencies supplying data
    • Usually use a commonly-used data interchange or GIS format, but will sometimes have their own "standard"
    • USGS data: DLG (digital line graph) format for vector data, DEM format for raster digital elevation models
    • Census Bureau Data: TIGER (topologically integrated geographic encoding and referencing). Landmarks; census blocks, block groups, tracts; streets with addresses (lion files).
    • Many private data providers use these files to create their products; often accuracy increases and data is in one of the common GIS formats.
  • Data Interchange Formats, commonly used data formats accepted by many GIS directly or via import utilities.
    • Desktop publishing formats.
        • Raster: TIF (tagged interchange format), GIF (Graphics Interchange format), BMP (Windows bitmap), JPG.
        • Vector: Postscript, Adobe Illustrator, etc.
    • Some "export" formats of common GIS and other softwares (e.g. ArcInfo *.E00 format; Autocad *.DXF format).
    • SDTS format (spatial data transfer standard). Federally approved in 1992. Takes hold, there are utilities to read and export to this format. See details at http://www.fgdc.gov/index.html
    • Translator Software: utilities that translate from one format to another (Image Alchemy, xv).
    • Virtually all GIS also have import/export utilities for reading data in other formats.
    • Some GIS (e.g. IDRISI) provide general conversion tools to disassemble and reassemble files.
    • Some GIS also convert between raster and vector formats.
  • Compression
      • Several types of compression are used to make the storage needed as small as possible. To read the data, you might need to decompress it. (ZIP, other formats)
  • By using converters and import/export utilities, you should be able to read a file with your GIS software and turn it into a GIS format that can be used analytially.

Problem: Losses of information or alteration of data (e.g., rounding of coordinates or attribute values if coverting from real to integer data types) might occur at every step of conversion.

Creating your own data

If data that you need for your project does not exist or cannot be found or you cannot afford to purchase it.

Find it in hard-copy format and digitize

  • Manual Entry - of attributes found in a report, for example, into your attribute database attached to a vector layer.
  • Digitizing - from a map or air photo using a digitizing tablet or on-screen.
  • Scanning.
  • Type point coordinates or addresses and then import into a GIS and plot.

Collect it yourself in the field then enter it

  • Surveying
  • GPS (need to note in a notebook, then type in, or store and download/import)
  • Interviewing, etc.

Use surrogate data instead

    • E.g. need powerlines, but don't have powerlines and don't have the time/funds to do a field collection. Notice that powerlines mostly go along the paved roads. Use paved roads as a surrogate for powerlines - but be aware that this might introduce error!

Derive it from existing data

  • With knowledge of relationships and context can derive new data from old data.
  • For example, if you need a DEM you can derive it from contours or points; if you need an index of biomass, then derive it from satellite imagery; temperature data from elevation, etc.

ALWAYS DOCUMENT EVERY DATA SET. Write down the source, georeferencing information, data format, description of the content of your data, error description, etc.

Top of page


DATABASE INTEGRATION

Once you acquired your data sets, you need to make sure that they will work together. It means that they have to be in the same referencing system, have the corresponding resolution, and have an acceptable degree of error.

Georeferencing

Is your data georeferenced?

  • If yes, what is the coordinate system, reference units, projection type, datum type, and ellipsoid type? Is it in a projection that you intend to use for all of your datasets?
    • If yes, convert other data sets to this projection.
    • If not, is it in a projection that you can change with software you have?
  • GIS have utilities for converting spatial data from one projection/reference system to another.
    • In Idrisi32 - module PROJECT (under Reformat Menu) uses over 400 georeferencing files to convert between various georeferencing systems.
    • In ArcView - Use Projection Utility (Start/ArcView/Projection Utility - you access it without opening ArcView).
  • If your data is not georeferenced (e.g. you scanned it, or got it in a desktop graphic format), can it be georeferenced?
    • In this case, use Idrisi32 module RESAMPLE under (Reformat Menu).
  • Geodetic information on the web

Resolution

  • spatial (cell resolution, vector data units - census blocks, zip codes, counties, states, countries, etc.)
    • Does it match the resolution of the question being asked?
    • Does the original resolution of your data layers match other layers? Remember, digital spatial data is scaleless, but its resolution (degree of detail and precision) depends on the original resolution of the analog map or image.
  • temporal - both spatial and attribute data change over time.
    • Is it current enough?
    • Are the time slices frequent enough and spaced appropriately?
    • Is your spatial database current enough relative to how your attribute data is structured? (lot boundaries, census blocks and tracts change over time)
  • attribute
    • can you get the variables you need? e.g. demographic characteristics by census tract for 1999?

Accuracy

Does the data meet the accuracy standards of the study?

Sources of error in the database

  • Conceptual - how your database was designed and organized (spatial units, attributes, etc.). Is it adequate for answering your questions?
  • Spatial accuracy - errors in spatial database (RMS error is used to estimate the positional error - error in x and y coordinates). E.g., your street files - are the streets where they are?
  • Attribute accuracy - errors in attribute database .
    • Addresses - how they are assigned to corresponding streets, spelling errors in street names.
    • Elevation measurements
    • Other mistakes in attribute tables

Types of error

  • random - cannot eliminate but can estimate (overall accuracy), e.g., 10% of addresses have random errors due to manual entry.
    • The degree of error remain constant in future transformations of your data.
  • systematic - can eliminate if able to detect. E.g., a street name is mispelled consistently due to copying. Fix this mistake, and level of address matching will increase.
    • If not fixed, systematic errors will propagate during data transformations. E.g., your distance data is off because you set incorrectly the cell resolution (it was too coarse). If you use this image to estimate housing values based on a regression equation, multiplication will magnify the degree of error in your results.

Top of page


OTHER CONSIDERATIONS

Cost

Field surveys are expensive and time consuming. But you have total control over your data.

Possibilities for analysis and consequent use

Often high costs (time and money) spent on data base development do not justify themselves.

  • Databases are not used analytically (only to display a few maps) because GIS technicians lack analytical skills.
  • Conceptual mistakes at the database development stage prevent from efficient use of data.
  • Public, community, or institutional benefits are small because there is a communication barrier between GIS technicians on the one hand and the public and policy makers on the other. The former have technical skills and control over the GIS but no training in public and social issues while the latter have the vision but no ability to communicate their intent in technical terms.

Top of page