To understand how we get from analog to digital maps, let us begin with the building blocks and foundations of the geographic information system (GIS) – namely, data and information. Geographic information systems stores, edits, processes, and presents data and information. However, what exactly is data? Moreover, what exactly is information? For many, the terms “data” and “information” refer to the same thing. For our purposes, it is useful to make a distinction between the two. Generally, data refer to facts, measurements, characteristics, or traits of an object of interest. For you grammar sticklers out there, note that “data” is the plural form of “datum.” For example, we can collect data about all kinds of things, like the length of rainbow trout in a Colorado stream, the number of vegetarians in Alaska, the diameter of mahogany tree trunks in the Brazilian rainforest, student scores on the last GIS midterm, the altitude of mountain peaks in Nepal, the depth of snow in the Austrian Alps, or the number of people who use public transportation to get to work in London.
Once data are put into context, used to answer questions, situated within analytical frameworks, or used to obtain insights, they become information. Information simply refers to the knowledge of value obtained through the collection, interpretation, and analysis of data. Though a computer is not necessary to collect, record, manipulate, process, or visualize data, or to process it into information, information technology can be of great help. For instance, computers can automate repetitive tasks, store data efficiently in terms of space and cost, and provide a range of tools for analyzing data from spreadsheets to GIS. An incredible amount of data is collected daily by satellites, grocery store product scanners, traffic sensors, temperature gauges, smartphone apps and endlessly more. This data would not be possible without the aid and innovation of information technology.
Geographic or spatial data refer to geographic facts, measurements, or characteristics of an object that permit us to define its location on the surface of the earth. Such data include, but are not restricted to, the latitude and longitude coordinates of points of interest, street addresses, postal codes, political boundaries, and even the names of places of interest. It is also important to note and reemphasize the difference between geographic data and attribute data. Geographic data is concerned with defining the location of an object of interest, attribute data is concerned with its nongeographic traits and characteristics.“Spatial data is information about the locations and shapes of geographic features and the relationship between them, usually stored as coordinates and topology.” – Esri
To illustrate the distinction between geographic and attribute data, think about your home where you grew up or where you currently live. Within the context of this discussion, we can associate both geographic and attribute data to it. We can define the location of your home in many ways, such as with a street address, the street names of the nearest intersection, the zip code or Census block your home is located, or latitude and longitude coordinates. What is essential is geographic data permit us to define the location of an object (i.e., your home) on the surface of the earth.
In addition to the geographic data that defines the location of your home are the attribute data that describes the various qualities of your home. Such data could include the number of bedrooms and bathrooms in your home, whether or not your home has central air, the year your home was built, the number of occupants, or whether or not there is a swimming pool. These attribute data tell us a lot about your home but relatively little about where it is.
Not only is it useful to recognize and understand how geographic and attribute data differ and complement each other, but it is also of central importance when learning about and using GIS. Because a GIS requires and integrates these two distinct types of data, being able to differentiate between geographic and attribute data is the first step in organizing your GIS. Furthermore, being able to determine which kinds of data you need will ultimately aid in your implementation and use of a GIS. More often than not, and in the age and context of information technology, the data and information discussed thus far is the stuff of computer files, which are the focus of the next section.
Files and Formats
When we collect data about your home, rainforests, or anything, really, we usually need to put them somewhere. Though we may scribble numbers and measures on the back of an envelope or write them down on a pad of paper, if we want to update, share, analyze, or map them in the future, it is often useful to record them in digital form so a computer can read them. Though we will not bother ourselves with the bits and bytes of computing, it is necessary to discuss some fundamental elements of computing that are both relevant and required when learning and working with a GIS.
One of the most common elements of working with computers and computing itself is the file. Files in a computer can contain any number of things from a complex set of instructions (e.g., a computer program) to a list of numbers and letters (e.g., address book). Furthermore, computer files come in all different sizes and types. One of the clues we can use to distinguish one file from another is the file extension. A file extension refers to the letters that follow the period (“.”) after the name of the file. The table below contains some of the most common file extensions and the types of files with which they are associated.
|filename.txt||Simple text file|
|filename.doc||Microsoft Word document|
|filename.pdf||Adobe portable document format|
|filename.jpg||Compressed image file|
|filename.tif||Tagged image format|
|filename.html||Hypertext markup language (used to create websites)|
|filename.xml||Extensible markup language|
Some computer programs may be able to read or work with only specific file types, while others are more adept at reading multiple file formats. What you will realize as you begin to work more with information technology, and GIS, in particular, is that familiarity with different file types is essential. Learning how to convert or export one file type to another is also a beneficial and valuable skill to obtain. In this regard, being able to recognize and knowing how to identify different and unfamiliar file types will undoubtedly increase your proficiency with computers and GISs.
Of the numerous file types that exist, one of the most common and widely accessed files is the simple text, plain text, or just text file. Simple text files can be read widely by word processing programs, spreadsheet and database programs, and web browsers. Often ending with the extension “.txt” (i.e., filename.txt), text files contain no special formatting (e.g., bold, italic, underlining) and contain only alphanumeric characters. In other words, images or sophisticated graphics are not well suited for text files. Text files, however, are ideal for recording, sharing, and exchanging data because most computers and operating systems can recognize and read simple text files with programs called text editors.
When a text file contains data that are organized or structured in some fashion, it is sometimes called a flat-file (but the file extension remains the same, i.e., .txt). Generally, flat files are organized in a tabular format or line by line. In other words, each line or row of the file contains one and only one record. So if we collected height measurements on three people, Tim, Jake, and Harry, the file might look something like this:
Each row corresponds to one and only one record, observation, or case. There are two other essential elements to know about this file. First, note that the first row does not contain any data; instead, it describes the data contained in each column. When the first row of a file contains such descriptors, it is referred to as a header row or just a header. Columns in a flat-file are also called fields, variables, or attributes. “Height” is the attribute, field, or variable that we are interested in, and the observations or cases in our data set are “Tim,” “Jake,” and “Harry.” In short, rows are for records; columns are for fields.
The second unseen but critical element to the file is the spaces in between each column or field. In the example, it appears as though a space separates the “name” column from the “height” column. Upon closer inspection, however, note how the initial values of the “height” column are aligned. If a single space were being used to separate each column, the height column would not be aligned. In this case, a tab is being used to separate the columns of each row. The character that is used to separate columns within a flat-file is called the delimiter or separator. Though any character can be used as a delimiter, the most common delimiters are the tab, the comma, and a single space. The following are examples of each.
|Name Height||Name Hight||Name, Height|
|Tim 6’1″||Tim 6’1″||Tim, 6’1″|
|Sarah 5’7″||Sarah 5’7″||Sarah, 5’7″|
|Maria 5’5″||Maria 5’5″||Maria, 5’5″|
Knowing the delimiter to a flat-file is essential because it enables us to distinguish and separate the columns efficiently and without error. Sometimes such files are referred to by their delimiters, such as a “comma-separated values” file or a “tab-delimited” file.
When recording and working with geographic data, the same general format is applied. Rows are reserved for records, or in the case of geographic data, locations and columns or fields are used for the attributes or variables associated with each location. For example, the following tab-delimited flat file contains data for three places (i.e., countries) and three attributes or characteristics of each country (i.e., population, language, continent), as noted by the header.
Files like those presented here are the building blocks of the various tables, charts, reports, graphs, and other visualizations that we see every day online, in print, and on television. They are also vital components to the maps and geographic representations created by GISs. Rarely if ever, however, will you work with one and only one file or file type. More often than not, and especially when working with GIS, you will work with multiple files. Such a grouping of multiple files is called a database. Since the files within a database may be different sizes, shapes, and even formats, we need to devise some type of system that will allow us to work, update, edit, integrate, share, and display the various data within the database. Such a system is generally referred to as a database management system (DBMS). Databases and DBMSs are so crucial to GISs that a later chapter is dedicated to them. Geodatabases are a collection of geographic data, contained within a common file system.