GEO326G/386G, Fall 2004 Lab 3

Lab 3: GIS Data Models

N.B. This lab is a modified version of an original created by Sarah Battersby and Nicholas Matzke for Geography 176B at the UC Santa Barbara Department of Geography. © 2000, Regents of the University of California. Used by permission of the authors.

Outline

Objectives
Introduction and Background
- Data Models and Data Modeling
Data
Procedures
Conclusion
Additional reading
What to turn in
- Tables, questions, and map.

2.1 Objectives

To gain a clear understanding of what a data model is and why data models are important
To learn the data models ESRI supports in ArcGIS, and the similarities and differences between them
To reinforce basic ArcGIS skills

Note - I have attempted to edit the text to be consistent with the data model terminology and hierarchy introduced in class. Please excuse any inconsistencies and realize that other use different nomenclature for the same concepts, or the same nomenclature differently. M.H.

2.2 Introduction and background

There are two widely used logical data models in GIS: vector and raster.

Vector:

ESRI vector models recognizes three basic types of vector data: points, lines (polylines, arcs and others), and polygons. Vector data are used to model features having discrete location(s) in space.

Raster:

Raster datasets are data sources that uses a grid structure to store geographic information. Satellite images, for example, are composed of a grid of square pixels arranged in rows and columns. The raster model is best suited to describing continuous phenomena; things that aren't discrete.

Attribute data:
In a GIS, both raster and vector data are linked to attribute information: descriptive information about features and how they are coded. For example, a GIS database of well locations may be linked to tables of down hole pressures and water chemistries for the wells. How location information and tabular attribute data are linked in a GIS differs among software vendors and within ESRI products of different ages; the physical data models and software are continually evolving. Much of what is discussed below describes these differences and the data models that were developed under different schemes of linkage, one aspect of a physical data model.

Question 1:
a) Which type of vector data would be most appropriate for representing a lake? a river? a spring?
b) Give the name of a raster data file that you used in Lab 2.

Geographic Data Modeling: An Introduction

Data Model - An abstraction of the real world that incorporates only those properties thought to be relevant to the application at hand; defines specific groups of entities, their attributes and the relationships between these entities. A data model is independent of a computer system.

Data models are a crucial concept for GIS users to understand. Data models describe how geographic data will be represented and stored. The choice of a data model will yield benefits in terms of simplifying aspects of the real world, but can also incur costs by oversimplification or misrepresentation.

A traditional paper map is an example of an analog data model - the cartographer has abstracted/generalized the real world with a set of conventions to represent important aspects of the landscape. In a computer, all information must be reduced to numbers (1010000110...). Abstractions of a real world model must be formalized in a data model that defines how the computer will store the geographic information (its "geometry" and its attributes).

Bernhardsen (1999) diagrams the data model formalization process along these lines:

Figure 1: The modeling process (after Bernhardsen 1999, p.39. Map graphics from http://www.gis.com/)

In order for geographic data to be represented digitally, a geographic data model has to be adopted or created. Most of the confusion about data models arises from the diversity of geographic data models. Geographic data models have evolved under the influences of technology (e.g., increasing digital storage space and processing power, networking, and software evolution) and history (e.g. ESRI introduced the "coverage" data model in 1980, the shapefile model in 1990s and the geodatabase model in the first years of this decade).

Every GIS software package will be capable of supporting a number of data models. The capabilities of the data models may change with new versions of the software, and compatibility issues may arise between different GIS software and even between different versions of the same software. Certain functions may be accessible with data in the form of one data model but not another.

Data Structures vs. Data Models

A data model is a conceptual idea - how do we represent the real world in a GIS? How this conceptual representation is actually stored in the computer is the data structure (or the physical model, c.f. the lecture notes). A vector data model could be implemented in a computer in a number of ways. A vector consists of a start node, vertices in between and an end node. This is stored in the computer as a table of the locations of each node and vertex. The software reads this information and draws a line on the screen. The format that the coordinates are stored in depends on the data structure. A GIS that consists of points, lines and polygons relies on the logical vector data model, whereas the way that the data are physically stored and organized, whether it be an ESRI coverage or shapefile, is the data structure, comprising a physical model for the data.

In Figure 1 above, the lower left box titled "DATABASE (relational tables)" represents the data structure. In it you can see numbered rows and columns with name. This is the 'structure' of the data. Some columns have only numbers, some have only text and some have both.

The confusion surrounding what a data structure is can be reduced if one thinks of the geographic data models as fitting within a general hierarchy, as discussed in class. Below is a figure showing the hierarchy of ArcGIS's data models. (Note that this hierarchy is slightly at odds with the the one discussed in class but is similar in many regards.)

Figure 2: Schematic hierarchy of ESRI's ArcGIS data models. The three top levels are logical data models in the terminology of the hierarchy discussed in class. The georelational and geodatabase models are physical models, implementations of particular schemes of data storage and organization. The figure is somewhat confusing in how it shows geodatabases as a branch of the Vector model, because they can, in fact, contain coverages, shapefiles, rasters and TINS.

GIS Information Resources

National Center for Geographic Information and Analysis (NCGIA) has their core GIScience curriculum online. Some resources relevant to Data Structures and Data Models: Fundamentals of Data Storage, Information Organization and Data Structure, and Non-spatial Database Models.

Question 2:
a) A linear feature is represented by a line in a vector data model, yet the feature’s position is ultimately defined by points. Explain.
b) Name a continuous, numeric variable (other than those already mentioned) that could be modeled by raster data.
c) What is the difference between a data structure and a data model?

Data Models, Datasets, and Feature Classes in ArcGIS

In ArcCatalog the type of physical data model for every spatial dataset is identified by a small picture or icon. Only file formats recognized by ArcCatalog as geographic in nature are displayed.

Life will be much easier if you learn ArcCatalog's icons. There are a lot of them and they can be initially confusing, so here is the handy table from Lab 1 that you can refer to. More complete listings can be found in "Modeling Our World" (hereafter MOW) in the Digital Books class network folder. Below is a display from ArcCatalog showing how data model Types (right-most column of the graphic) are identified by icons.

Figure 3. ArcCatalog icons representing different data models types.

The folder and file display of shapefiles, coverages, geodatabase feature classes, rasters, and TINs in ArcCatalog is arranged in a hierarchy. Data related in specific, logical ways to one another (shared spatial reference, storage location, etc.) are organized by the inset, branching hierarchy familiar from Windows Explorer. This is a powerful conceptual way of displaying the physical data models implemented within ArcGIS and their relationships to one another. However, this is not the way ESRI or other geographic data appear in Windows Explorer, which instead shows a literal tree of the folders and files, hiding some, displaying others, without any indication of the logical relationships among the data. Get used to this duality. It can be a source of much confusion to new users. The ArcCatalog tree represents a higher level of organization than what is visible in Windows Explorer. A single icon in the tree may represent many files and/or more than one folder.

Figure 4, below, shows an ArcCatalog hierarchy of folders and icons, annotated with names. Feature classes are the lowest level of the hierarchy. Examine and read this diagram carefully.

Figure 4: Icons and hierarchy in ArcCatalog

For Shapefiles, the shapefile itself is the feature class. Each class of geographic feature (donut shops, streets, etc.) will be contained in its own shapefile and pertains to a map feature. This style of organization is different from coverages, discussed below. For example, if we wanted to map surface water features using shapefiles, we would have a line shapefile for streams and another for shorelines, a polygon shapefile for lakes, and point shapefiles for springs and water wells - a total of five separate shapefiles. Geometric data (i.e. coordinates) are stored in hidden binary tables that can not be directly viewed, but are represented by a field called "Shape" in a feature classes' attribute table. The attribute information (stored in dBASE tables) can be displayed with the Preview tab of ArcCatalog. This linkage of geometric files to separate attribute tables is intrinsic to shapefiles and coverages and is called the georelational data model by ESRI. Unlike coverages, shapefiles do not explicitly store topology, but build it on-the-fly each time a shapefile is loaded.
For Coverages, each feature class does not correspond to a map feature. Coverage feature classes are standard categories like arc, label, polygon, tic, etc. that together comprise a common group of map elements. A common map element like "hydrography", for example, might be stored in a coverage that contains a point feature class for springs and wells, an arc feature class for the streams and shorelines, a polygon feature class for lakes, and an annotation feature class for the stream/lake names (see fig. 4). Additional feature classes within the same coverage will contain tics (see below), links, etc.; see MOW for a complete list. Within ArcCatalog, coverage feature classes are found in a folder. This folder is the coverage. All feature classes within a coverage share a common spatial reference, as they must if they together represent a map element. The primary feature classes of a coverage store feature coordinates in hidden, separate, binary "Arc" tables that can not be directly viewed in ArcCatalog, but are represented by a field called "Shape" in the feature classes' attribute table. Certain features of the topology are visible in feature class attribute tables, with field names such as FNODE#, TNODE# and LPOLY#, RPOLY# (examined further below). These terms should be familiar from lecture. Feature classes are linked to attribute tables (INFO tables). Like shapefiles, coverages employ a georelational data model.
The organization and structure of Geodatabases, the latest ESRI data structure, incorporates the best aspects of shapefiles and coverages and greatly extends them. Two types of geodatabases are recognized: Personal geodatabases and multi-user geodatabases. Personal geodatabases permit access by one user at a time and store data in a Microsoft Access database. A multi-user geodatabase permits access and editing by multiple users at the same time, as might be required in a business environment ("Enterprise GIS"), and is compatible with business database software such as Oracle, Informix, DB2 and others. Like shapefiles, each feature class in a geodatabase corresponds to a map feature, such as roads, counties, etc. Feature classes can be grouped into feature datasets, a group of feature classes that might contain data about a region or topic (in Figure 4, the "USA container" feature dataset contains US capitals, counties etc.). Many feature datasets can be stored within a geodatabase. Each feature dataset can have its own spatial reference; in that sense a feature dataset is somewhat like a coverage. Existing shapefiles and coverages can be imported into a geodatabase using tools available in ArcToolbox and ArcCatalog. New geodatabase feature classes can be created in ArcCatalog and ArcMap. Unlike shapefiles and coverages, geodatabases employ a geodatabase data model that stores each feature as a row in a relational database table. Because geodatabases can explicitly store relationships ("relationship classes") among objects (information tables) and feature classes (groups of things that have x, y coordinates) or between different feature classes, feature behaviors can be codified (e.g. a river ends upon entering a lake; contour lines should break where labels are present). Geodatabases are explored further below.

Look again at Figure 4. Notice that the geodatabase, the coverages, and the shapefiles are all contained within the folder named "Some-Data". The little blue symbol on the folder indicates that it contains recognizable geographic data in the first level beneath "Some-Data". In the context of coverages, this folder is referred to as a workspace.

Additional Note: Notice that none of the file/folder names in Figure 4 contain spaces. Spaces within names are instead represented by underscores and hyphens. ArcGIS software is generally tolerant of spaces in names in some situations, but not others. ArcToolbox, in particular, needs uninterrupted paths to files and folders; spaces are interpreted as separate words. It likewise is sometimes intolerant of file and folder names that exceed 13 characters. If you violate these rules you will get an error message of the sort "spaces are not permitted in the path name" or something even more obscure. Save yourself some grief and don't tempt fate - don't use spaces in file/folder names and keep them under 14 characters in length.

Question 3:
a) We’ve identified a feature class as the lowest level in the organizational hierarchy of spatial data files. But what exactly is a feature class? Use the glossary in the ArcGIS Desktop Help menu to define feature class.

b) Using the definition you retrieved for “feature class,” what type of vector data model is “National” in Figure 4? Landusecov in Figure 4?

c) How is a coverage different from a shapefile?

d) Explain two ways the geodatabase data model differs from the coverage data model or shapefile data model.

2.3 Data

Open My Computer and go to your y: drive and create a folder (right-click New -> Folder) and name it Lab_3.
Copy the entire Lab_3_data folder to the folder you just created. The folder contains the following files and folders:

/mystery -- Contains 8 data layers of several features in different data models. You will be figuring out what these are in the lab.
/sb
roads -- Santa Barbara county roads coverage, clipped to the Goleta-Santa Barbara region
SB_CO_all_roads - Shapefile of all roads in Santa Barbara County, clipped from a state-wide roads shapefile.
sbdem -- digital elevation model of Santa Barbara County
sbtin -- TIN derived from sbdem
sbcontour -- Contour coverage derived from sbdem
cacounties -- counties of California, from the GDT dataset

The Santa Barbara street data we are using were provided by GDT.

2.4 Procedures

ArcGIS Help

ArcInfo Help works like any Windows program help section. This is an EXTREMELY valuable resource for this class and in the future. Read it and learn how to use it. Go to Menu Bar -> Help -> ArcGIS Help.

When you're looking for something in ArcGIS Help, make sure to search both the Index and the Search tab. Trying the search with different terms (e.g., data models, or coverage, or geodatabase) increases the odds of finding something useful. ArcOnline is also an excellent resource (see below).

Question 4:
Use ArcGIS Help to find "coverages" to answer the following questions.
a) List the feature classes that a coverage can contain.
b) What is the purpose of an INFO table? (Use Help on "Info tables")
c) What are tic points?

Use ArcInfo Help to find "shapefiles" to answer the following question.
d) How many feature classes can a shapefile use?

2.4.2 Mystery Models

Examine the layers in the folder mystery using ArcCatalog and/or ArcMap.

Question 5:
What are the data models for each of the layers? What feature does each layer represent?
(Be as specific as possible for both questions.)

mystery1 --
mystery2 --
mystery3 --
mystery4 --
mystery5 --
mystery6 --
mystery7 --
mystery8 --

Once you have identified the layers and their data models, convert mystery5 into the same data model as mystery2. You will have to figure out how to do this yourself, but here are some hints:

Converting Between Data Models

You will have to use ArcToolbox to accomplish this task. Recall that you can open by clicking on the ArcToolbox button in ArcCatalog.
- We are doing a conversion, so navigate to the toolbox menu that would contain the appropriate tools.
  - Find the appropriate sub menu for converting data in mystery5's datamodel.
  - Find the tool that will let you convert to mystery2 's data structure.
- You should be able to figure out which layer to use as input. Recall that you can drag-and-drop from ArcCatalog instead of typing or browsing. Use the defaults for everything else unless you are in an experimental mood.

Give the output a name you will remember and run the conversion. Take your resulting layer and display it in ArcMap, along with mystery5 and mystery2.

Question 6:
a) How does mystery2 compare with your converted layer? You should examine the data at various scales before answering this question.

b) Considering the type of data represented, which is a more appropriate data model, the one before or after the conversion?

Delete mystery5 and the converted layer from your map document
Go to the directory sb.
Now, add sbcontour, sbdem, and sbtin to your ArcMap document. Display just sbcontour and sbtin, and overlay sbcontour on top of sbtin. To make the display intelligible, you will have to change the properties for the two layers.

Changing Layer Properties in ArcMap

To change the Properties of a layer in ArcMap, right-click on sbtin in the TOC and go to Properties. Double-clicking on sbtin will also work.

You get a large window with many tabs, like this:

Go to the Display
Change the transparency of sbtin so that the DEM raster can be seen underneath it, and click OK
Make sure the TIN layer displays on top of the DEM layer.

If you're curious about making better use of Properties, the main methods are the creation of Layers in ArcCatalog, and ArcMap's Style Manager, found in the Menu Bar under Tools -> Styles -> Style Manager.

You will be repeating these steps to change a layer's properties many, many times throughout the semester. You will find the Properties functions very useful. ArcMap's Style Manager is an easier way to manipulate layer properties that we may learn about later in the semester, but feel free to experiment with it.

Question 7:
a) Where are contour values stored, and what is the contour interval for the data being displayed? (Consult the Arc feature class attribute table for sbcontour)

b) How would you change the display to show contours by different line widths, e.g. heavier lines for the 0 and 1200 contours? List the steps.

c) The DEM is composed of cells, each with a single elevation value. What is the range of elevation values and what is the x, y dimension (with units) of each cell?

2.4.3 Data Structures and ArcToolbox

Coverages are the vector data structures long used in the old Unix workstation version of ARC/INFO. Therefore, many of the ArcToolbox tools simply use a wizard to create a command line that runs an ARC process in the background. As a result, many of the tools only support coverages, although some of the newer tools are designed for geodatabases or shapefiles. The older tools designed only to support coverages are in a Toolbox called "Coverage Tools". Many of the same tool, generalized to support other feature class types, can be found in the "Analysis Tools" Toolbox, a new feature in ArcGIS 9.0. To familiarize yourself with the Toolbox and the input formats required, find each tool listed below and figure out what kind of input file(s) it supports (e.g., coverage, geodatabase feature class, grid, TIN, etc.).

Finding and Examining Tools

Again, recall that you can open ArcToolbox by clicking on the ArcToolbox button in ArcCatalog.
If you can't find a particular tool in ArcToolbox, try the Index and Search tabs at the bottom of the ArcToolbox window and search by name and/or description.
For more information on a tool, open it and click "Show Help", or simply right click on the tool name and select "Help" from the menu.

Question 8:
Find each of these tools and determine what physical data model type(s) (e.g. shapefile, coverage, grid, or perhaps other file types) it takes as input:

a) Clip, Select, Intersect, Buffer, & most other tools n the Analysis Tools toolbox (all the same answer)
b)Darcy Flow tool
c) Export to Interchange File
d) Join Info Tables
e) Create a TIN from a raster

2.4.4 AATs & PATs

As discussed above, coverages have been the standard vector data model for previous releases of Arc/INFO. With the release of ArcInfo 8 (ArcGIS), all of the modules of Arc/INFO (Arc, ArcEdit, Grid, Tables, ArcPlot, INFO etc.) have been integrated, and a new geodatabase data model has been promoted. A large amount of legacy data exists as Coverages so we need to know something about their structure.

Recall from above that coverages employ the georelational database model and that they store geometric (i.e coordinate) and attribute information in separate tables. The attribute tables reside in files that are stored in what is called an INFO folder, whereas the geometric tables (including Arc tables) are stored directly within the coverage folder itself (these relationships are visible with Windows Explorer but not ArcCatalog). INFO files contain tables of attributes, including topological information and feature descriptions, for example parcel number and land use codes. These INFO attribute tables store features in rows (rows are database "records") and attributes by columns (database "fields"). Attribute and geometry tables are linked ("related" in relational database lingo) through a common attribute (field), which is the so-called Primary KEY (more on this in lectures to come). The use of relational databases is the origin of the "relational" part of the georelational database name. In a standard relational database, the KEY can be any of several attributes. In a georelational database, such as that of a coverage, the KEY is an ID field that specifies a geographic location.

The polygon coverage illustrated in Figure 5 serves as a simple example of the above concepts. The primary key is the polygon identifier (A, B, C). The polygon attribute table has attributes that include parcel number and land use.

Figure 5. (from ESRI)

Let's explore an attribute table that is part of the roads coverage. Go to ArcCatalog and Preview the data.

Previewing Tables

Below the preview map, locate the Preview box:
Change the preview option from Geography to Table.
You are now looking at the arc attribute table (AAT).

Answer the question below.

Question 9:
a) How many records are there?
b) What do FNODE# and TNODE# mean?
c) What other attribute information can you recognize or guess at in the table (pick 3 columns)?

For a look at polygons and Polygon Attribute Tables (PATs), open cacounty. Explore the tables for the tic, arc, polygon, and region.cty coverage feature classes.

Sorting a Column in Table Preview, and Searching for a Text String

To sort an attribute table (e.g., polygon), click on the column heading you wish to sort.
This should highlight the column.
Then, right-click and choose Sort Ascending or Sort Descending.

Now, open the cacounty coverage, examine the coverage feature classes and note the differences. What is the region.cty feature class? Now answer the questions below.

Question 10:
a) How many counties are there in California? HINT: The bottom part of the Table preview in ArcCatalog or the layer's attribute table in ArcMap may help you.
b) Why do the AAT and PAT have different numbers of records?
c) Explain the relationship between arc, polygon, and region.cty in this coverage.
d) What are the label and tic feature classes for?
Hints: To figure out the answers, you will need to examine the tables. In addition, you might want to use the Identify Tool in the Geography Preview to query a few of the features. Also use ArcGIS Help, as described above.

Map for Lab 2:
Make an 8.5x11" map of mainland Santa Barbara County showing the county outline, roads and contours for the entire county and an expanded inset that shows roads in the area of the city of Santa Barbara. An additional file is available, SB_CO_all_roads, that shows roads throughout Santa Barbara County. Use it for the county map. Use the roads coverage for the city inset, which should use most of the page. Symbolize the roads by TYPE or another field to show only the major types (e.g. don't show neighborhood roads, circle drives, etc.). You may use only four colors: black (or shades of gray), white, red and green. You will have to choose appropriate symbols for the themes so that they are not confused. Be sure you follow the basic principles of cartography outlined in Lab 1 and Tim's layout tips.

2.4.5 Relationships in GIS

So far we have focused on digitally models for geographic features. Now we are going to look at models for the relationships between features. These relationships can have specific behaviors and can follow rules. A primary advantage of the new geodatabase model is that it gives you the ability to build structured relationships between features. One important advantage of building relationships and behaviors for features is that it can improve data integrity - someone entering data can only enter permitted values, and values of one attribute can be constrained by another.

To get a handle on this, consider the classic example of a power pole and transformers. Perhaps you want to describe the location of the transformer on the pole -- e.g., height in feet and the side of the pole the transformer is on (North, West, etc.). The geodatabase designer could constrain the possible entries in the "location" field for the transformer to only North, South, East, or West. Then, a person doing data entry would simply select the appropriate direction from the available options. Similarly, the designer could constrain the "height" field for the transformer to between 10 and 20 feet.

The designer could also limit the number of relationships a particular pole can have with transformers. In the real world, several transformers can reside on a pole. However, an unlimited number of transformers will not fit -- we might imagine that four transformers is the maximum. The geodatabase designer could constrain the number of relationships the pole has with transformers to between 0 and 4. After four transformers have been assigned to that pole, a transformer would have to be deleted before another could be added.

The relationship between poles and transformers is directional as well. In a directional relationship, changing A will change B, but changing B will not change A. If you move a pole (in real life and in the GIS), you want the transformers on the pole to move as well. But you don't want to be able to move a transformer in the database by itself, as it must always be on a pole. If you delete a pole from the data layer, you will want the records for the transformers on that pole to be deleted from the database as well. But if you delete a transformer, the pole should remain.

Question 11:
Come up with an example of two simple geographic or geologic features that you might want to represent in a geodatabase as having a relationship. Come up with some rules for the relationship describing directionality and data entry constraints. This is just a conceptual exercise, so you do not have to actually create the relationship rules in the computer. Creativity is fine for this question as long as you show that you understand the concept of relationships between features.

Conclusion

In this lab, you have gained a basic understanding of geographic data models and data modeling, and the primary data models used in ESRI's ArcGIS software. You have seen how the ESRI data models are similar and different from each other, and how each has advantages and disadvantages for certain purposes. You have gained further experience with some basic ArcGIS 9.0 skills, such as changing properties and using the Help functions. Finally, you have learned about the important concept of relationships in GIS.

Additional Reading

Zeiler, Michael. Modeling Our World: The ESRI Guide to Geodatabase Design. Redlands, CA: ESRI Press, 1999, pp. 1-199. In the Online Books network class folder.

Online Sources:

Geo327G/386G class notes on Data Models, and ESRI Data Models
AGI dictionary Definition of "Data Model"
FOLDOC definition of datamodel

2.7 To turn in

The question sheet, with typed answers (Word document)
One map of Santa Barbara County

This is a modified version of a lab created by Nicholas Matzke, Sarah Battersby and Jeff Hemphill, UC Santa Barbara, Department of Geography. © 2000, Regents of the University of California. Used by permission of the authors.
Modified my M. Helper, A. Baldwin, and T. Pierce, T. Hedayati, UT Austin; 2004, 2005, 2007

Spring 2007
GEO327G/386G: GIS & GPS Applications in Earth Sciences

Labs

Lab 3: GIS Data Models

Spring 2007 GEO327G/386G: GIS & GPS Applications in Earth Sciences

Labs

Lab 3: GIS Data Models

Spring 2007
GEO327G/386G: GIS & GPS Applications in Earth Sciences