Chapter 1 Introduction As in any other technical area, there can be a fair amount of technical vocabulary to learn before a student can be comfortable with the subject. Since this text is introductory by design, we will try to be consistent in our use of language. Much of the vocabulary of geographic information systems overlaps that of computer science and mathematics in general, and computer graphics applications in particular. We provide a glossary of technical terms at the end of this text, as a reference for the student. 1.1 Geography Geography has been facetiously defined as that discipline which, when some use is found for it, is called something else. Slightly more serious scholars have defined geography as "what geographers do". The German philosopher Immanual Kant set geography in the context of the sciences by stating that knowledge could be subdivided into three general areas: 1. those disciplines that study particular objects or sets of objects and phenomena (such as biology, botany, forestry, and geology); 2. those disciplines that look at things through time (in particular, history); and 3. those disciplines that look at features within their spatial context(specifically, geographic disciplines) . In a more classical sense, the word geography may be defined in terms of its constituent parts: geo and graphy. Geo refers to the Earth, and graphy indicates a process of writing; thus geography (in this literal interpretation) means writing about the Earth. Another definition of geography focuses on man's relationship with the land. In their writings, geographers deal with spatial relationships. A key tool in studying these spatial relationships is the map. Maps present a graphic portrait of spatial relationships and phenomena over the Earth, whether a small segment of it or the entire globe. It is interesting that in a survey conducted to determine what factors influenced people to adopt the profession of geography, an early interest in maps rated at the top of the list. There are many skills that people possess to a greater or lesser degree. If a person speaks well, he or she possesses fluency. If a person understands writing well, he or she possesses literacy. If a person understands numbers and quantitative concepts well, he or she possesses (at least in Great Britain) numeracy. Similarly, there is a special skill in the analysis of spatial patterns in two and three dimensions. This skill can be referred to as graphary. Although many individuals take this skill for granted, we all know those who have difficulty reading maps or interpreting aerial photographs. What these two activities have in common is the use of an essentially two dimensional view of geographic space, a view that helps the adept map-reader or photointerpreter to understand spatial relationships. 1.2 Information Systems The function of an information system is to improve one's ability to make decisions. An information system is that chain of operations that takes us from planning the observation and collection of data, to storage and analysis of the data, to the use of the derived information in some decision-making process (Calkins and Tomlinson, 1977). This brings us to an important concept: a map is a kind of information system. A map is a collection of stored, analyzed data, and information derived from this collection is used in making decisions. To be useful, a map must be able to convey information in a clear, unambiguous fashion, to its intended users. A geographic information system (GIS) is an information system that is designed to work with data referenced by spatial or geographic coordinates. In other words, a GIS is both a database system with specific capabilities for spatially-referenced data, as well a set of operations for working with the data (see Figure 1.1). In a sense, a GIS may be thought of as a higher-order map. As we shall see later, a modern GIS also stores and manipulates non-spatial data. Just as we have maps designed for specific tasks and users (road maps, weather maps, vegetation maps, and so forth), we can have GISs designed for specific users. The better we are able to understand the range of needs of a user, the better we will be able to provide the correct data and tools to that user. A geographic information system can, of course, be either manual (sometimes called analog) or automated (that is, based on a digital computer). Manual geographic information systems usually comprise several data elements including maps, sheets of transparent materials used as overlays, aerial and ground photographs, statistical reports and field survey reports. These sets of data are compiled and analyzed with such instruments as stereo viewers, transfer scopes of various kinds, and mechanical and electronic planimeters. Calkins and Tomlinson (1977) point out that manual techniques could provide the same information as computer-aided techniques, and that the same processing sequences may occur. While this may no longer be entirely true, manual GISs have played an extremely important role in resource management and planning activities. Furthermore, there are still applications where a manual GIS approach is entirely appropriate. Although this text focuses on the technology, instrumentation, and utilization of geographic information systems that are automated, it is still helpful to examine a manual GIS first. 1.3 A Manual Geographic Information System To introduce some of the language of geographic information systems with a simple first example, let's examine an application of a simple manual GIS. This GIS arises during the early steps in developing a site for a golf course. We assume for this discussion that a specific site is already under consideration. A planner has sought out and gathered together a group of existing datasets for the site. This group might include a topographic map, a blue-line map of parcel boundaries from the local municipal planning agency, and an aerial photograph of the site (Figure 1.2). We refer to these three datasets - 2 maps and a photograph - as data layers or data planes. The topographic map depicts several kinds of information. Elevation on the site is portrayed as a series of contour lines. These contour lines provide us with a limited amount of information about the shape of the terrain. Certain kinds of land cover are indicated by colors (often blue for water, green for vegetation) and textures or patterns (such as repeated patterns denoting wetlands). A number of kinds of man-made features are indicated, including structures and roadways, typically by lines and shapes printed in black. In many cases, the information on this map is five to fifteen years out of date, a common situation resulting from the rate of change of land cover in the area and the cycle of map updates. Each of these different kinds of information, which we may decide to store in various ways, is called a theme. The map from the local planning agency provides us with additional and different kinds of information about the area. This map focuses principally on the infrastructure: legal descriptions of the proposed golf course property boundaries, existing and planned roadways, easements of different kinds, and the locations of existing and planned utilities such as potable water, electric and gas supplies, and the sanitary sewer system. The planning map is probably not at the same scale as the topographic map; the former is probably drawn at a larger scale than the latter. Furthermore, the two aren't necessarily based on the same map projection (see section 6.6.1). For a small area like our golf course, the approximate scale of the data is probably more important than the details of the map projection. The aerial photograph is a rich source of data, particularly for an analyst with some background in image interpretation. A skilled interpreter may be able to detect patterns in soils, vegetation, topography, and drainage, based on the content of the photograph. Unfortunately, this photograph is probably of different scale than either of the two maps, and may have significant geometric distortions. The two maps attempt to be planimetric, that is, the horizontal spatial relationships between objects on the ground are correctly represented on the maps. The photograph, on the other hand, probably suffers from both the perspective distortion inherent in all photographs and from a non-vertical point of view. A second step in developing plans for this site is to manipulate the three datasets so they can be used simultaneously. A cartographer or draftsperson is given the task of redrawing the municipal planning map and the topographic map onto plastic film, in such a way that the features on the new film-maps overlay their counterparts on the aerial photograph. This process, called registration, in effect causes the objects (buildings, roadways, and so forth) to move form their original locations in the planning map, so that they fall at the positions they are found in the photograph. Alternatively, the photograph could be manipulated in such a way that the visible features overlay the corresponding elements on the planning map, and then a transparency could be made. In any ease, this spatially registered set of data planes is now a useful geographic database. Since the three sets of information have now been converted to overlay each other, further manipulations are much easier. Note that if the original aerial photography were chosen as the base data layer, the resulting database may have no simple relationship to a well-known geodetic coordinate system, such as latitude and longitude. However, for applications that cover a small area, this may not be a serious problem. Once the individual data layers have been adjusted to a common view of the Earth's surface, there are a number of analytic operations we might make with this manual geographic information system. The analyst begins by drawing some new features on another sheet of plastic that overlays the other data layers. For example, we might generate 25-meter-wide corridors at the edges of the property, and 10-meter-wide corridors around the existing roads and proposed utility locations. These newly derived regions might suggest some places that are unsuitable for development of large new facilities, and others that are particularly desirable due to proximity to needed utilities. As such, we might now be in a position to make preliminary decisions on the location of the club house, storage yards, access roads, and parking facilities. Next, the planner lays a coarse grid over the database, and start marking the conjunction of topographic and hydrologic features and vegetation that are most suitable for fairways and tee-off areas. Existing waterways could be used as boundaries between areas on the golf course, while the locations of wooded areas could be considered as part of the course plan. Based on these preliminary decisions, we prepare a new data layer, which is a draft of the proposed course layout. By combining the tentative orientations of tees, fairways, and greens with the original topographic map, we could start to make calculations to estimate volumes of earth that must be moved to create the course. (Such cut-and-fill calculations are often considered the domain of civil engineering.) And once we have a tentative course layout, we can use a planimeter or map wheel to determine the length of each hole, which then provides us with the total course length (which is an important consideration for any golfer). Furthermore, based on a determination of the area of the holes, we can even begin to be able to calculate our needs for grass seed and fertilizer. Overall, this process has involved a number of key steps. Several different kinds of spatial data were located, and then manipulated so that the important features in each were found at the same locations. Once these data were brought into a common geographic or spatial referencing system, it was possible to use them together, to develop a variety of types of derived information: the determination of potential corridors on the site, proposed locations for constructing facilities, and eventually, engineering estimates for earth moving equipment operators. As we will see, this is a very typical flow of data and information through a spatial data processing and analysis problem. 1.4 Applications The number of data layers one needs to consider varies greatly from one application to another. Consider a more complex problem: deciding on the location of an airport. Some of the data layers or themes that a planner might require to site an airport include: Administrative Infrastructure Land Ownership Transportation Network Government Jurisdiction Utility Corridors Rights-of-Way Zoning Restrictions Mining Claims Existing and Use Biotic Endangered Species Abiotic Vegetation Cover Surface Geology Subsurface Geology Climatic Surface Water Temperature Subsurface Water Precipitation Flood Plains Fog Archaeological Sites Wind Elevation Photoperiod Geographic information systems are used in a wide variety of settings. Landscape architects have embraced the concepts behind GISs for many years, analyzing site suitability and developing capabilities of planning for a specified use (McHarg, 1969). Civil engineers and architects involved in developing large sites have comparable interests and techniques, including considerations of environmental impacts such as noise perception and obscuring or changing views. Forestry professionals use this technology for site mapping and management, and for pest and disease monitoring. City planners are using geographic information systems to help automate tax assessment, emergency vehicle routing, and maintenance of transportation facilities and public lands. Environmental managers and scientists use these systems for such applications as maintaining an inventory of rare and endangered species and their habitats, and monitoring hazardous waste sites. In addition to these kinds of applications, military planners add several more: gauging the ability of heavy vehicles to traverse different kinds of terrain, and determining which sites on military bases which are suitable for various kinds of training exercises. We discuss in more detail a few of these varied kinds of applications in Chapter 12. 1.5 Geographical Concepts Before proceeding further, we will introduce a number of terms in common usage (based in part on the brief discussion in Van Roessel, 1987). We will return to some of these in more detail in chapter 3. Spatial objects are delimited geographic areas, with a number of different kinds of associated attributes or characteristics. The golf course discussed above is a spatial object: it is a specific area on the ground, with many distinct characteristics (such as land use, tax rate, types of vegetation, number of parking spaces, etc.). On the golf course are a number of other spatial objects, such as the greens and fairways. A point is a spatial object with no area. The holes on our golf course represent points, even though they do in actuality cover a finite area. One of the key attributes of a point are its geodetic location, often represented as a pair of numbers (such as latitude-longitude, or northing-easting). There may be a range of data associated with a point, depending on the application. In our example, we may wish to record the number of the hole, as well as the date when a given hole on our golf course was placed on the green. The latter is useful so that we may remember to move the hole periodically to minimize wear on the green. A line is a spatial object, made up of a connected sequence of points. Lines have no width, and thus, a specified location must be on one side of the line or the other, but never on the line itself. One important line in our example might indicate the out-of-bounds line between holes. Attributes we could attach to that line include the numbers of the holes that the line separates, and whether the line is indicated on the course by markers of a certain color. Nodes are special kinds of points, usually indicating the junction between lines or the ends of line segments. A polygon is a closed area. Simple polygons are undivided areas, while complex polygons are divided into areas of different characteristics. Since our example golf course hole has interior objects, such as the sand trap and the green, it is a complex polygon; since the sand trap is homogeneous (according to the available information in the figure), it is a simple polygon. Attached to the polygons on our golf course might be information about the length and area of each hole, and the kind and amount of seed and fertilizer used to maintain the fairways. Chains are special kinds of line segments, which correspond to a portion of the bounding edge of a polygon. Figure 1.5 illustrates some of these different kinds of spatial objects, by focusing on one hole in our golf course. The boundary around the entire hole represents the boundary of a complex polygon. The location of the hole (or more specifically, its center) is a point. The 100-yard markers on either side of the fairway are certainly points, but since they form the ends of a line segment, we call them nodes. The portion of the out-of-bounds line that corresponds to the eastern edge of this hole would be considered a chain, since it corresponds to a portion of the polygon surrounding the entire hole. We have already used the word scale in our discussions. By scale we mean the ratio of distances represented on a map or photograph to their true lengths on the Earth's surface. Scale values are normally written as dimensionless numbers, indicating that the measurements on the map and the earth are in the same units. A scale of 1:25000, pronounced one to twenty five thousand, indicates that one unit of distance on a map corresponds to 25,000 of the same units on the ground. Thus, one centimeter on the map refers to 25,000 centimeters (or 250 meters) on the Earth. This is exactly the same as one inch on the map corresponding to 25,000 inches (or approximately 2,080 feet) on the Earth. Note that scale always refers to linear horizontal distances and not measurements of area or elevation. The terms small scale and large scale are in common use. A simple example helps to illustrate the difference. Consider a field 100 meters on a side. On a map of 1:10000 scale, the field is drawn l centimeter on a side. On a map of 1:1,000,000 scale, the field is drawn 0.1 millimeter on a side. The field appears larger on the 1:10000 scale map; we call this a large-scale map. Conversely, the field appears smaller on the1:1,000,000 scale map, and we call this a small-scale map. Said in another way, if we have a small area of the earth's surface on a page, we have a large-scale map; if we have a large area of the earth’s surface on a page we have a small-scale map. An important concept when working with spatial data is that of resolution. Most dictionaries define resolution in such terms of “distinguishing the individual parts of an object.” For our purposes, however, we need a more specific definition. Tobler (1987) defines spatial resolution for geographic data as the content of the geometric domain divided by the number of observations, normalized by the spatial dimension. The domain, for two dimensional datasets like maps and photographs, is the area covered by the observations. Thus, for two-dimensional data, take the square root of the ratio to normalize the value. For example, if the area of the United States is approximately 6 million square kilometers, and there are 50 states, then the mean resolution element of a map portraying the states would be: mean resolution element =  = approximately 346 km. =  This gives us a way of examining some spatial data, and calculating a representative value for the spatial resolution of the dataset. If we increase the number of observations, the mean resolution element decreases in size. Consider a map of the United States that indicates each of the 3141 counties: square root of (6 x/3141) =  = approximately 43 km. When we have more information, the mean resolution element gets smaller; we often call this a higher resolution dataset. Conversely, a lower resolution dataset will have fewer observations in an area, and thus, a larger mean resolution element. As we discuss in section 6.1.1, the size of the resolution element (sometimes abbreviated resel) is related to the size of the objects we can distinguish in a dataset. For interested readers, a good discussion of other important concepts, including geometrical operations and relationships, may be found in Nagy and Wagle (1979). 1.6 The Four Ms Our understanding of this planet has always been limited by our lack of information, as well as our lack of wisdom and knowledge. For things too small to see, we have developed microscopes that can image down to the molecular level. At the other end of the continuum, for things that are (in a very real sense) too large to see, we have geostationary satellites that can take an image of an entire hemisphere. Geographic information systems are a means of integrating spatial data acquired at different scales and times, and in different formats. Basically, urban planners, scientists, resource managers, and others who use geographic information work in several main areas. They observe and measure environmental parameters. They develop maps which portray characteristics of the earth. They monitor changes in our surroundings in space and time. In addition, they model alternatives of actions and processes operating in the environment. These, then, are the four Ms: measurement, mapping, monitoring, and modeling (Figure 1.4). These key activities can be enhanced through the use of information systems technologies, and in particular, through the use of a GIS. Geographic information systems have the potential for improving our understanding of the world around us. Yet these systems do not lessen the need for quality data, nor will these systems do the work for us. The work we can do with a GIS is clearly dependent on the quality of data it contains. Thus, care must be taken to understand the potential sources and relative magnitudes of errors which may occur when gathering and processing spatial data. In addition, one must be cautious of the potential for misinterpretation of the information output from a GIS. In interacting with a geographic information system, the user must not only understand the application, but also the characteristics of the tool and the system itself. Like all advanced technologies, the kinds of spatial data processing systems we will discuss must be employed wisely, to keep us from fooling ourselves. The following chapters discuss the wise use of geographic information systems.