LECTURE DEMO. GIS PROJECTS AND DATABASE DEVELOPMENT
Major issues:
- Differentiate 2 basic
approaches to using GIS
- Describe 3 ways to get data
into your database.
- Explain the importance of
file and physical formats to database development.
- Describe and give an example
of surrogate data and how it is different than deriving data.
- List and explain the major
issues to consider when building a database.
GIS is a process. Data are integrated into the system that
we call a GIS and analyzed in order to assist in solving questions that we are
asking.
The hardware and software are only two aspects of the system as an integrated
whole.
The other aspects of the system include the organization within which it
operates, the people that operate it and the data that feed it.
DATABASE
DEVELOPMENT
- In this framework, “Database
Development” is a key part of this process.
- According to Clarke (Clarke,
K.C. Getting Started with Geographic Information Systems. New
Jersey: Prentice Hall. 1997. pp115.), database
development “takes up anywhere between 60 and 90% of both the time and
money spent on a typical GIS project.”
- Therefore, understanding the
complexities and difficulties of finding existing data, creating your own
data, and then integrating these data into your GIS is an integral step in
project management. These steps constitute stages in database development.
- The way in which your
geographic database is set up will determine how well your data will be
retrieved, updated, integrated with other datasets, and analyzed.
- You might spend a lot of time
and effort to develop a database that will be unusable.
Geographic
database
- No matter how large and
multilayed your database (spatial and attribute) is, it remains a very
simplified model of the world.
- This means that all world
objects must be reduced to points, lines, polygons, and cells and
represented as such in your database.
- They are data units, and
your attribute data must desribe spatial objects as represented by these
geometric primitives.
- Essentially, only data
organized in this way will be suitable for conventional GIS display,
querying, and analysis.
- We will consider limitations
of such approach later in this course.
Database development and objectives for GIS
Two approaches to DB development:
- Large data collection
projects
- For multiple users
and multiple uses
- Often linked to
tabular data (vector format with tables)
- Examples: US Census
Bureau maps and data, CARSI's NYC map
- Question or problem
driven
- Informed by tools for
analysis
- Data collection
limited to what is needed to solve a particular problem
- Participants limited
to individual or research team
- Examples: research
projects, projects for organizations, communities, etc.
- Your class project?
Two major steps in database development: Data collection and
Data integration.
Top of page
DATA COLLECTION
Types of data based on its
availability: Existing data, non-existing data, surrogate data, derived data.
Finding and importing existing digital data
GIS databases have specific structure (spatial and attribute
database in raster or vector format). Only such files will be read by your GIS
package. Therefore, make sure that your data is in such format or can be
converted to such format using export/import utilities of your GIS software.
Where to find it
- State Agencies (e.g. USGS
1-800-USA-MAPS, NOAA, Bureau of Census, NYC Planning Department)
- This data is public
(you pay for its creation through your taxes). Therefore, it should be
available to you free of charge or for a very low price (e.g., price of
the media, etc.).
- Companies, private GIS
database providers (e.g. InfoTerra, SPOT, GDT)
- Private data
providers distribute data for profit. They might use the public data to
create their own product. Usually, these datasets are sold commercially
(designed for other private companies) and are expensive.
- World Wide Web - a few
places to start. Both public and private datasets can be located and
downloaded from the web. See GIS resources page
for some suggestions.
- Discussion Lists of GIS
users (GIS-L, IDRISI-L, ESRI-L). Usually, these are great communities -
ask a question and you will get many replies.
- Use data sets that
already are in use in your institution.
- Check out
S:\scratch\digital data
- Check out data sets
(shape files) that come with ArcView (d:\ArcView\Esridata)
- Ask your advisors,
peers, and share data yourself!
Physical Media
- Data comes on a variety of
physical media (CDs, Floppy disks, Zip disks, tapes, etc.).
- You have to have a READER
that can read data files from the physical media you get. Your computer
should have a physical drive to accept a specific media type as well as
the software (driver) to communicate with this drive. Before you can
do anything else, you need to get the files onto your harddisk (e.g.,
install drivers for reading digital tapes).
- Electronic delivery, FTP,
Web Download, usually no problem.
- CD or Floppy also no
problem, unless your machine can’t read either.
General Data formats
You want your data to be in
- GIS Raster or Vector
formats plus attribute table in a Database (*.DBF, *.MDB) or Spreadsheet
formats (*.XSL).
- Or files that can be
translated into these formats using export/import utilities of GIS.
Digital File formats
- Software-specific GIS
formats (Idrisi raster or vector formats, ArcView Shape or grid files,
MapInfo files, etc.)
- Agency Formats, files
produced by agencies supplying data
- Usually use a
commonly-used data interchange or GIS format, but will sometimes have
their own "standard"
- USGS data: DLG
(digital line graph) format for vector data, DEM format for raster
digital elevation models
- Census Bureau Data:
TIGER (topologically integrated geographic encoding and referencing).
Landmarks; census blocks, block groups, tracts; streets with addresses
(lion files).
- Many private data
providers use these files to create their products; often accuracy
increases and data is in one of the common GIS formats.
- Data Interchange Formats,
commonly used data formats accepted by many GIS directly or via import
utilities.
- Desktop publishing
formats.
- Raster: TIF
(tagged interchange format), GIF (Graphics Interchange format), BMP
(Windows bitmap), JPG.
- Vector:
Postscript, Adobe Illustrator, etc.
- Some
"export" formats of common GIS and other softwares (e.g.
ArcInfo *.E00 format; Autocad *.DXF format).
- SDTS format (spatial
data transfer standard). Federally approved in 1992. Takes hold, there
are utilities to read and export to this format. See details at http://www.fgdc.gov/index.html
- Translator Software:
utilities that translate from one format to another (Image Alchemy, xv).
- Virtually all GIS
also have import/export utilities for reading data in other formats.
- Some GIS (e.g.
IDRISI) provide general conversion tools to disassemble and reassemble
files.
- Some GIS also
convert between raster and vector formats.
- Compression
- Several types of
compression are used to make the storage needed as small as possible. To
read the data, you might need to decompress it. (ZIP, other formats)
- By using converters and
import/export utilities, you should be able to read a file with your GIS
software and turn it into a GIS format that can be used analytially.
Problem: Losses of information or
alteration of data (e.g., rounding of coordinates or attribute values if
coverting from real to integer data types) might occur at every step of
conversion.
Creating
your own data
If data that you need for your
project does not exist or cannot be found or you cannot afford to purchase it.
Find it in hard-copy format and digitize
- Manual Entry - of
attributes found in a report, for example, into your attribute database
attached to a vector layer.
- Digitizing - from a map or
air photo using a digitizing tablet or on-screen.
- Scanning.
- Type point coordinates or
addresses and then import into a GIS and plot.
Collect it yourself in the field then enter it
- Surveying
- GPS (need to note in a
notebook, then type in, or store and download/import)
- Interviewing, etc.
Use surrogate data instead
- E.g. need
powerlines, but don't have powerlines and don't have the time/funds to do
a field collection. Notice that powerlines mostly go along the paved
roads. Use paved roads as a surrogate for powerlines - but be aware that
this might introduce error!
Derive it from existing data
- With knowledge of
relationships and context can derive new data from old data.
- For example, if you need a DEM
you can derive it from contours or points; if you need an index of
biomass, then derive it from satellite imagery; temperature data from
elevation, etc.
ALWAYS DOCUMENT EVERY DATA SET.
Write down the source, georeferencing information, data format, description of
the content of your data, error description, etc.
Top of page
DATABASE
INTEGRATION
Once you acquired your data sets, you need to make sure that
they will work together. It means that they have to be in the same referencing
system, have the corresponding resolution, and have an acceptable degree of
error.
Georeferencing
Is your data georeferenced?
- If yes, what is the
coordinate system, reference units, projection type, datum type, and
ellipsoid type? Is it in a projection that you intend to use for all of
your datasets?
- If yes, convert
other data sets to this projection.
- If not, is it in a
projection that you can change with software you have?
- GIS have utilities for
converting spatial data from one projection/reference system to another.
- In Idrisi32 - module
PROJECT (under Reformat Menu) uses over 400 georeferencing files to
convert between various georeferencing systems.
- In ArcView - Use
Projection Utility (Start/ArcView/Projection Utility - you access it
without opening ArcView).
- If your data is not
georeferenced (e.g. you scanned it, or got it in a desktop graphic
format), can it be georeferenced?
- In this case, use
Idrisi32 module RESAMPLE under (Reformat Menu).
- Geodetic information on the
web
Resolution
- spatial (cell resolution,
vector data units - census blocks, zip codes, counties, states, countries,
etc.)
- Does it match the
resolution of the question being asked?
- Does the original
resolution of your data layers match other layers? Remember, digital
spatial data is scaleless, but its resolution (degree of detail and
precision) depends on the original resolution of the analog map or image.
- temporal - both
spatial and attribute data change over time.
- Is it current enough?
- Are the time slices
frequent enough and spaced appropriately?
- Is your spatial
database current enough relative to how your attribute data is
structured? (lot boundaries, census blocks and tracts change over time)
- attribute
- can you get the
variables you need? e.g. demographic characteristics by census tract for
1999?
Accuracy
Does the data meet the accuracy standards of the study?
Sources of error in the database
- Conceptual - how
your database was designed and organized (spatial units, attributes,
etc.). Is it adequate for answering your questions?
- Spatial accuracy -
errors in spatial database (RMS error is used to estimate the positional
error - error in x and y coordinates). E.g., your street files - are the
streets where they are?
- Attribute accuracy -
errors in attribute database .
- Addresses - how they
are assigned to corresponding streets, spelling errors in street names.
- Elevation
measurements
- Other mistakes in
attribute tables
Types of error
- random - cannot
eliminate but can estimate (overall accuracy), e.g., 10% of addresses have
random errors due to manual entry.
- The degree of error
remain constant in future transformations of your data.
- systematic - can
eliminate if able to detect. E.g., a street name is mispelled consistently
due to copying. Fix this mistake, and level of address matching will
increase.
- If not fixed,
systematic errors will propagate during data transformations. E.g., your
distance data is off because you set incorrectly the cell resolution (it
was too coarse). If you use this image to estimate housing values based
on a regression equation, multiplication will magnify the degree of error
in your results.
Top of page
OTHER
CONSIDERATIONS
Cost
Field surveys are expensive and time consuming. But you have
total control over your data.
Possibilities for analysis and consequent use
Often high costs (time and money) spent on data base
development do not justify themselves.
- Databases are not used
analytically (only to display a few maps) because GIS technicians lack
analytical skills.
- Conceptual mistakes at the
database development stage prevent from efficient use of data.
- Public, community, or
institutional benefits are small because there is a communication barrier
between GIS technicians on the one hand and the public and policy makers
on the other. The former have technical skills and control over the GIS
but no training in public and social issues while the latter have the
vision but no ability to communicate their intent in technical terms.
Top of page