N.B. This lab is a modified version of an original created by
Sarah Battersby and Nicholas Matzke for Geography 176B
at the UC Santa Barbara Department of Geography. © 2000, Regents of the
University of California. Used by permission of the authors. Outline
2.1 Objectives
- To gain a clear understanding of what a
data model is and why data models are important
- To learn the data models ESRI supports in
ArcGIS, and the similarities and differences between them
- To reinforce basic ArcGIS
skills
Note - I have attempted to edit the text to be consistent with the data
model terminology and hierarchy introduced in class. Please
excuse any inconsistencies and realize that other use different nomenclature
for the same concepts, or the same nomenclature differently. M.H.
2.2
Introduction and background
There are two widely used logical data models in GIS: vector and raster.
ESRI vector models recognizes three basic
types of vector data: points, lines (polylines, arcs and others), and polygons.
Vector data are used to model features having discrete location(s) in space.
Raster datasets are data sources that uses a grid structure to store
geographic information. Satellite images, for example, are composed of a
grid of square pixels arranged in rows and columns. The raster model is
best suited to describing continuous phenomena; things that aren't discrete.
Attribute data: In a GIS, both raster and vector data are linked to
attribute information: descriptive information about features and how they
are coded. For example, a GIS database of well locations may be
linked to tables of down hole pressures and water chemistries for the
wells. How location information and tabular attribute data are
linked in a GIS differs among software vendors and within ESRI products of
different ages; the physical data models and software are continually
evolving. Much of what is discussed below describes these
differences and the data models that were developed under different schemes
of linkage, one aspect of a physical data model.
Question 1: a) Which type of vector data would be most
appropriate for representing a lake? a river? a spring?
b) Give the name of a raster data file that
you used in Lab 2. |
Geographic
Data Modeling: An Introduction
Data Model - An abstraction of the real world
that incorporates only those properties thought to be relevant to the
application at hand; defines specific groups of entities, their
attributes and the relationships between these entities. A data model is
independent of a computer system.
Data models are a crucial concept for GIS
users to understand. Data models describe how geographic data will
be represented and stored. The choice of a data model will yield
benefits in terms of simplifying aspects of the real world, but can also
incur costs by oversimplification or misrepresentation.
A traditional paper map is an example of an analog data model - the cartographer has abstracted/generalized the real world with a set of
conventions to represent important aspects of the landscape. In a
computer, all information must be reduced to numbers
(1010000110...). Abstractions of a real world model must be
formalized in a data model that defines how the
computer will store the geographic information (its "geometry" and
its attributes).
Bernhardsen (1999) diagrams the data model
formalization process along these lines:
Figure 1: The modeling
process (after Bernhardsen 1999, p.39. Map graphics from http://www.gis.com/)
In order for geographic data to be
represented digitally, a geographic data model has to be adopted or
created. Most of the confusion about data models arises from the
diversity of geographic data models. Geographic data models have evolved under the
influences of technology (e.g., increasing digital storage space and processing
power, networking, and software evolution) and history (e.g. ESRI
introduced the "coverage" data model in 1980, the shapefile model in 1990s
and the geodatabase model in the first years of this decade).
Every GIS software package will be capable of
supporting a number of data models. The capabilities of the data
models may change with new versions of the software, and compatibility
issues may arise between different GIS software and even between different
versions of the same software. Certain functions may be accessible
with data in the form of one data model but not another.
Data Structures vs. Data Models
A data model is a conceptual idea - how do we
represent the real world in a GIS? How this conceptual representation is
actually stored in the computer is the data structure (or the
physical
model, c.f. the lecture notes). A vector data model could be implemented in a computer in a number of
ways. A vector consists of a start node, vertices in between and an
end node. This is stored in the computer as a table of the locations
of each node and vertex. The software reads this information and draws a line
on the screen. The format that the coordinates are stored in
depends on the data structure. A GIS that consists of points, lines and polygons relies on the logical vector data
model, whereas the way that the data are physically stored and organized,
whether it be an ESRI coverage or shapefile, is the data
structure, comprising a physical model for the data.
In Figure 1 above, the lower left box titled "DATABASE (relational tables)"
represents the data structure. In it you can see numbered rows and
columns with name. This is the 'structure' of the data. Some columns
have only numbers, some have only text and some have both.
The confusion surrounding what a data
structure is can be reduced if one thinks of the geographic data models as
fitting within a general hierarchy, as discussed in class. Below is a figure showing the hierarchy of ArcGIS's data
models. (Note that this hierarchy is slightly at odds with the the
one discussed in class but is similar in many regards.)

Figure 2:
Schematic hierarchy of ESRI's ArcGIS data models. The three top levels are logical data models
in the terminology of the hierarchy discussed in class. The georelational and geodatabase models are physical models, implementations
of particular schemes of data storage and organization. The
figure is somewhat confusing in how it shows geodatabases as a branch of
the Vector model, because they can, in fact, contain coverages, shapefiles, rasters and TINS.
Question 2: a) A linear feature is represented by a
line in a vector data model, yet the feature’s position is ultimately
defined by points. Explain. b) Name a
continuous, numeric variable (other than those already mentioned) that
could be modeled by raster data.
c) What is the difference between a data structure and a data
model? |
Data Models,
Datasets, and Feature Classes in ArcGIS
In ArcCatalog the type of physical data model for
every spatial dataset is identified by a small picture or icon. Only file formats recognized by ArcCatalog as geographic in nature are displayed.
Life will be much easier if you
learn ArcCatalog's icons. There are a lot of them and they can be
initially confusing, so here is the handy
table from Lab 1 that you can refer to. More complete listings
can be found in "Modeling Our World" (hereafter MOW) in the
Digital Books class network folder. Below is a display
from ArcCatalog showing how data model Types (right-most column of the
graphic) are identified by icons.

Figure 3. ArcCatalog icons representing different data models
types.
The folder and file display of shapefiles, coverages, geodatabase feature classes, rasters, and TINs
in ArcCatalog is arranged in a hierarchy. Data related in specific, logical
ways to one another (shared spatial reference, storage location, etc.) are organized by the
inset, branching hierarchy familiar from Windows Explorer. This is a powerful conceptual
way of displaying the physical data models implemented within ArcGIS and
their relationships to one another. However, this is not the way ESRI or
other geographic data appear in Windows Explorer, which instead shows a
literal tree of the folders and files, hiding some, displaying others,
without any indication of the logical relationships among the data.
Get used to this duality. It can be a source of much confusion to new users.
The ArcCatalog tree represents a higher level of
organization than what is visible in Windows Explorer. A single
icon in the tree may represent many files and/or more than one folder.
Figure 4, below, shows an ArcCatalog
hierarchy of folders and icons, annotated with names. Feature classes are the lowest level
of the hierarchy. Examine and read this diagram carefully.

Figure 4: Icons and
hierarchy in ArcCatalog
- For Shapefiles, the shapefile itself is
the feature class. Each class of geographic feature (donut shops, streets,
etc.) will be contained in its own shapefile and pertains to a map
feature. This style of organization is different from coverages,
discussed below. For example, if we wanted to map surface water
features using shapefiles, we would have a line shapefile for streams and another
for shorelines, a polygon shapefile for lakes, and point shapefiles
for springs and water wells - a total of five separate shapefiles. Geometric data (i.e. coordinates)
are stored in hidden binary tables that can not be directly viewed,
but are represented by a field called "Shape" in a feature
classes' attribute table. The attribute information
(stored in dBASE tables) can be displayed with the Preview tab of
ArcCatalog. This linkage of geometric files to separate attribute tables is
intrinsic
to shapefiles and coverages and is called the georelational data
model by ESRI. Unlike coverages, shapefiles do not
explicitly store topology, but build it on-the-fly each time a
shapefile is loaded.
- For Coverages, each feature class
does not correspond to a map feature. Coverage feature
classes are standard categories like arc, label, polygon, tic, etc.
that together comprise a common group of map elements. A
common map element like
"hydrography", for example, might be stored in a coverage
that contains a point feature class for springs and wells,
an arc feature class for the
streams and shorelines, a polygon feature class for lakes, and
an annotation feature class
for the stream/lake names (see fig. 4). Additional feature classes within
the same coverage will contain tics
(see below), links, etc.; see MOW for a complete list. Within ArcCatalog, coverage feature classes are found in a folder. This folder is the
coverage. All feature classes within a coverage share
a common spatial reference, as they must if they together represent a
map element. The primary feature classes of a coverage store feature coordinates in hidden, separate, binary "Arc"
tables that can not be directly viewed in ArcCatalog, but are
represented by a field called "Shape" in the feature
classes' attribute table. Certain features of the topology are
visible in feature class attribute tables, with field names such as FNODE#, TNODE# and LPOLY#, RPOLY#
(examined further below). These terms should be
familiar from lecture. Feature
classes are linked to attribute tables (INFO tables). Like shapefiles,
coverages employ a georelational data model.
- The organization and structure of Geodatabases,
the latest ESRI data structure, incorporates the
best aspects of shapefiles and coverages and greatly extends them. Two
types of geodatabases are recognized: Personal geodatabases and multi-user
geodatabases. Personal geodatabases permit access by one user
at a time and store data in a Microsoft Access database. A multi-user geodatabase permits access and editing by multiple users at the same
time, as might be required in a business environment
("Enterprise GIS"), and is compatible with
business database software such as Oracle, Informix, DB2 and others. Like shapefiles,
each feature class in a geodatabase corresponds to a map feature, such as
roads, counties, etc. Feature classes can be grouped into feature datasets, a
group of feature classes that might contain data about a
region or topic (in Figure 4, the "USA container" feature
dataset contains US capitals, counties etc.). Many feature
datasets can be stored within a geodatabase. Each feature dataset
can have its own spatial reference; in that sense a feature dataset
is somewhat like a coverage. Existing shapefiles and coverages
can be imported into a geodatabase using tools available in
ArcToolbox and ArcCatalog. New geodatabase feature classes can
be created in ArcCatalog and ArcMap. Unlike shapefiles and
coverages, geodatabases employ a geodatabase data model that stores each
feature as a row in a relational database table. Because geodatabases can explicitly store relationships ("relationship
classes") among objects (information tables) and feature
classes (groups of things that have x, y coordinates) or between
different feature classes, feature behaviors can be codified (e.g. a
river ends upon entering a lake; contour lines should break where
labels are present). Geodatabases are explored further below.
Look again at Figure 4. Notice that the geodatabase, the coverages, and the
shapefiles are all contained within the folder named "Some-Data". The
little blue symbol on the folder indicates that it contains
recognizable geographic data in the first level beneath
"Some-Data". In the context of coverages, this folder is
referred to as a workspace.
Additional Note: Notice that none of the file/folder names
in Figure 4 contain spaces. Spaces within names are instead represented
by underscores and hyphens. ArcGIS software is generally tolerant
of spaces in names in some situations, but not others.
ArcToolbox, in particular, needs uninterrupted paths to files and
folders; spaces are interpreted as separate words. It likewise is
sometimes intolerant of file and folder names that exceed 13 characters.
If you violate these rules you will get an error message of the sort "spaces
are not permitted in the path name" or something even more obscure. Save yourself
some grief and don't tempt
fate - don't use spaces in file/folder names and keep them under 14
characters in length.
Question 3: a) We’ve identified a feature class as the
lowest level in the organizational hierarchy of spatial data files.
But what exactly is a feature class? Use the glossary in the ArcGIS
Desktop Help menu to define feature class.
b) Using the definition you retrieved for
“feature class,” what type of vector data model is “National” in
Figure 4? Landusecov in Figure 4?
c) How is a coverage different from
a shapefile?
d) Explain two ways the geodatabase
data model differs from the coverage data model or shapefile data
model. |
2.3 Data
- Open My Computer and go to your y:
drive and create a folder (right-click New -> Folder) and name it
Lab_3.
- Copy the entire Lab_3_data folder
to the folder you just created. The folder contains
the following files and folders:
/mystery -- Contains 8 data layers
of several features in different data models. You will be figuring out
what these are in the lab.
/sb
roads -- Santa Barbara county roads
coverage, clipped to the Goleta-Santa Barbara region
SB_CO_all_roads - Shapefile of all roads in Santa Barbara County,
clipped from a state-wide roads shapefile.
sbdem -- digital elevation
model of Santa Barbara County
sbtin -- TIN derived from
sbdem
sbcontour -- Contour coverage derived from sbdem
cacounties -- counties of California, from the GDT dataset
The Santa Barbara street data we are using
were provided by GDT.
2.4
Procedures
ArcGIS
Help
ArcInfo Help works like any Windows
program help section. This is an EXTREMELY valuable resource for this class and in the future. Read it and learn how to use
it. Go to Menu Bar -> Help -> ArcGIS Help.
When you're looking for something in
ArcGIS Help, make sure to search both the Index and the Search
tab. Trying the search with different terms (e.g., data models, or
coverage, or geodatabase) increases the odds of finding something
useful. ArcOnline is also an excellent resource (see below).

|
Question 4: Use ArcGIS Help to find "coverages"
to answer the following questions. a) List the feature classes that a coverage can contain. b) What is the purpose
of an INFO table? (Use Help on "Info tables") c) What are tic points?
Use ArcInfo Help to find "shapefiles" to answer the following question. d) How many feature classes can a shapefile
use? |
2.4.2 Mystery
Models
Examine the layers in the folder mystery using ArcCatalog and/or ArcMap.
Question 5: What are the data models for each of the layers?
What feature does each layer represent? (Be as specific as possible
for both questions.)
mystery1 -- mystery2 -- mystery3 -- mystery4 -- mystery5
-- mystery6
-- mystery7
-- mystery8
-- |
Once you have identified the layers and their
data models, convert mystery5 into the same data model as
mystery2. You will have to figure out how to do this yourself, but
here are some hints:
Converting Between Data Models
- You will have to use ArcToolbox to
accomplish this task. Recall that you can open by clicking on the ArcToolbox button
in ArcCatalog.
- We are doing a conversion, so navigate to the toolbox menu
that would contain the appropriate tools.
- Find the appropriate sub menu for converting data in
mystery5's datamodel.
- Find the tool that will let you convert to mystery2
's data structure.
- You should be able to figure out which layer to use as input.
Recall that you can drag-and-drop from ArcCatalog instead of
typing or browsing. Use the defaults for everything else unless
you are in an experimental mood.
|
Give the output a name you will remember and run the conversion. Take
your resulting layer and display it in ArcMap, along with mystery5
and mystery2.
Question 6: a) How does mystery2 compare with
your converted layer? You should examine the data at various
scales before answering this question.
b) Considering the type of data
represented, which is a more appropriate data
model, the one before or after the conversion? |
- Delete mystery5 and the converted layer from your map
document
- Go to the directory
sb.
- Now, add sbcontour, sbdem, and
sbtin to your ArcMap document. Display just sbcontour and
sbtin, and overlay sbcontour on top of sbtin.
To make the display intelligible, you will have to change the properties
for the two layers.
Changing Layer Properties in ArcMap
To change the Properties of a layer in ArcMap, right-click on
sbtin in the TOC and go to Properties. Double-clicking
on sbtin will also work.
You get a large window with many
tabs, like this:
- Go to the Display
- Change the transparency of
sbtin so that the DEM raster can be seen underneath it, and
click OK
- Make sure the TIN layer displays on top of the DEM layer.
|
If you're curious about
making better use of Properties, the main methods are the creation
of Layers in ArcCatalog, and ArcMap's Style Manager, found in the
Menu Bar under Tools -> Styles -> Style
Manager. |
You will be repeating these steps to change a
layer's properties many, many times throughout the semester. You will
find the Properties functions very useful. ArcMap's Style Manager
is an easier way to manipulate layer properties that we may learn about
later in the semester, but feel free to experiment with it.
Question 7:
a)
Where are contour values stored, and what is the contour interval for
the data being displayed? (Consult the Arc feature class attribute
table for sbcontour) b) How would you change the
display to show contours by different line widths, e.g. heavier
lines for the 0 and 1200 contours? List the steps.
c) The DEM is composed of cells, each with a single elevation value.
What is the range of elevation values and what is the x, y dimension
(with units) of each cell? |
2.4.3 Data Structures and ArcToolbox
Coverages are the vector data structures long used in the old Unix
workstation version of ARC/INFO. Therefore, many of the ArcToolbox tools
simply use a wizard to create a command line that runs an ARC process in
the background. As a result, many of the tools only support coverages,
although some of the newer tools are designed for geodatabases or
shapefiles. The older tools designed only to support coverages are
in a Toolbox called "Coverage Tools". Many of the same tool,
generalized to support other feature class types, can be found in the
"Analysis Tools" Toolbox, a new feature in ArcGIS 9.0. To familiarize yourself with the Toolbox and the input
formats required, find each tool listed below and figure out what kind of
input file(s) it supports (e.g., coverage, geodatabase feature class,
grid, TIN, etc.).
Finding and Examining
Tools
- Again, recall that you can open
ArcToolbox by clicking on the ArcToolbox button
in ArcCatalog. - If you can't find a particular tool
in ArcToolbox, try the Index and Search tabs at the bottom of the
ArcToolbox window and search
by name and/or description.
- For more information on a tool, open it and click "Show Help",
or simply right click on the tool name and select "Help"
from the menu.
|
Question 8: Find each of these tools and determine what
physical data
model type(s) (e.g.
shapefile, coverage, grid, or perhaps other file types) it takes as
input:
a) Clip, Select, Intersect, Buffer,
& most other tools n the Analysis Tools toolbox (all the same answer)
b)Darcy Flow tool c) Export to Interchange File d) Join Info Tables
e)
Create a TIN from a raster
|
2.4.4 AATs
& PATs
As discussed above, coverages have been the
standard vector data model for previous releases of Arc/INFO. With
the release of ArcInfo 8 (ArcGIS), all of the modules of Arc/INFO (Arc, ArcEdit,
Grid, Tables, ArcPlot, INFO etc.) have been integrated, and a
new geodatabase data model has been promoted. A large amount of
legacy data exists as Coverages so we need to know
something about their structure.
Recall from above that coverages employ the
georelational database model and that they store geometric (i.e
coordinate) and attribute information in separate tables. The attribute
tables reside in files that are stored in what is called an INFO
folder, whereas the geometric tables (including Arc tables) are
stored directly within the coverage folder itself (these relationships are
visible with Windows Explorer but not ArcCatalog). INFO files
contain tables of attributes, including topological information and
feature descriptions, for example parcel number and land use codes. These INFO attribute tables store features in
rows (rows are database "records") and attributes by columns
(database "fields"). Attribute and geometry tables
are linked ("related" in relational database lingo) through a common
attribute (field), which is the so-called Primary KEY (more on this in
lectures to come). The use of relational databases is the origin of the "relational"
part of the georelational database name. In a standard
relational database, the KEY can be any of several attributes. In a georelational
database, such as that of a coverage, the KEY is an ID field that
specifies a geographic location.
The polygon coverage illustrated in Figure 5 serves as a simple example of
the above concepts. The primary key is the polygon identifier (A, B,
C). The polygon attribute table has attributes that include parcel
number and land use.

Figure 5. (from ESRI)
Let's explore an attribute table that is part of the roads coverage. Go to ArcCatalog and Preview the
data.
Previewing Tables
- Below the preview map, locate the
Preview box:
- Change the preview option from
Geography to Table.
- You are now looking at the arc
attribute table (AAT).
Answer the question below. |
Question 9: a) How many records are there? b) What do FNODE# and TNODE# mean?
c) What other attribute information
can you recognize or guess at in the table (pick 3
columns)? |
For a look at polygons and Polygon Attribute
Tables (PATs), open cacounty. Explore the tables for the
tic, arc, polygon, and region.cty coverage feature
classes.
Sorting a Column in Table Preview, and Searching
for a Text String
- To sort an attribute table (e.g.,
polygon), click on the column heading you wish to
sort.
- This should highlight the
column.
- Then, right-click and choose Sort
Ascending or Sort Descending.
Now, open the cacounty coverage, examine the coverage feature
classes and note the differences. What is the region.cty
feature class? Now answer the questions below. |
Question 10: a) How many counties are there in
California? HINT:
The bottom part of the Table preview in ArcCatalog or the layer's
attribute table in ArcMap may help you. b) Why do the AAT and
PAT have different numbers of records? c) Explain the relationship between arc,
polygon, and region.cty in this coverage.
d) What are the label and
tic feature classes for?
Hints: To figure out the answers, you will need to examine
the tables. In addition, you might want to use the Identify
Tool in the Geography Preview
to query a few of the features. Also use ArcGIS Help, as described
above.
|
Map for Lab 2:
Make an 8.5x11" map of
mainland Santa Barbara County showing the county outline, roads and
contours for the entire county and an expanded inset that shows roads
in the area of the city of Santa Barbara. An additional file is
available, SB_CO_all_roads, that shows roads throughout Santa
Barbara County. Use it for the county map. Use the
roads coverage for the city inset, which should use most of the
page. Symbolize the roads by TYPE or another field to show
only the major types (e.g. don't show neighborhood roads, circle
drives, etc.). You may use only four colors: black (or shades of
gray), white, red and green. You will have to choose appropriate
symbols for the themes so that they are not confused. Be sure you follow the
basic
principles of cartography outlined in Lab 1 and
Tim's layout
tips. |
2.4.5
Relationships in GIS
So far we have focused on digitally models for
geographic features. Now we are going to look at models for the
relationships between features. These relationships can have
specific behaviors and can follow rules. A primary advantage of the new geodatabase model is that it gives you the ability to build structured
relationships between features. One important advantage of building
relationships and behaviors for features is that it can improve data
integrity - someone entering data can only enter permitted values, and
values of one attribute can be constrained by another. To get a handle on this, consider the
classic example
of a power pole and transformers. Perhaps you want to describe the
location of the transformer on the pole -- e.g., height in feet and the
side of the pole the transformer is on (North, West, etc.). The geodatabase designer could constrain the possible entries in the
"location" field for the transformer to only North, South, East, or
West. Then, a person doing data entry would simply select the
appropriate direction from the available options. Similarly, the designer
could constrain the "height" field for the transformer to between 10 and
20 feet.
The designer could also limit the number of
relationships a particular pole can have with transformers. In the
real world, several transformers can reside on a pole. However, an
unlimited number of transformers will not fit -- we might imagine that
four transformers is the maximum. The geodatabase designer could
constrain the number of relationships the pole has with transformers to
between 0 and 4. After four transformers have been assigned to that pole,
a transformer would have to be deleted before another could be
added.
The relationship between poles and
transformers is directional as well. In a directional relationship,
changing A will change B, but changing B will not change A. If you
move a pole (in real life and in the GIS), you want the transformers on
the pole to move as well. But you don't want to be able to move a
transformer in the database by itself, as it must always be on a
pole. If you delete a pole from the data layer, you will want the
records for the transformers on that pole to be deleted from the database
as well. But if you delete a transformer, the pole should remain.
Question 11: Come up with an example of two simple
geographic or geologic features that you might want to represent in a geodatabase as having a relationship. Come up with some rules for
the relationship describing directionality and data entry
constraints. This is just a conceptual exercise, so you do not
have to actually create the relationship rules in the computer.
Creativity is fine for this question as long as you show that you
understand the concept of relationships between
features. |
Conclusion
In this lab, you have gained a basic
understanding of geographic data models and data modeling, and the primary
data models used in ESRI's ArcGIS software. You have seen how the ESRI data models are similar and different from each other, and how each
has advantages and disadvantages for certain purposes. You have
gained further experience with some basic ArcGIS 9.0 skills, such as
changing properties and using the Help functions. Finally, you have
learned about the important concept of relationships in GIS.
Additional
Reading
Zeiler, Michael. Modeling Our World:
The ESRI Guide to Geodatabase Design. Redlands, CA: ESRI Press,
1999, pp. 1-199. In the Online Books network class folder. Online Sources:
2.7 To turn
in
- The question sheet, with typed answers (Word
document)
- One map of Santa Barbara County
This is a modified version of a lab created by Nicholas Matzke, Sarah Battersby
and Jeff Hemphill, UC Santa
Barbara, Department of Geography. © 2000, Regents of the University of California.
Used by permission of the authors.
Modified my M. Helper, A. Baldwin, and T. Pierce, T. Hedayati, UT Austin; 2004, 2005,
2007
|
 |