2.3 Researching Data

Now that we have a basic understanding of data and information, where can we find such data and information? Though an Internet search will certainly come up with myriad sources and types of data, the hunt for relevant and useful data is often a challenging and iterative process. Therefore, before hopping online and downloading the first thing that appears from a web search, it is useful to frame our search for data with the following questions and considerations:

What exactly is the purpose of the data?

Given the fact the world is swimming (maybe drowning) in vast amounts of data, articulating why we need, or why we do not need, a given set of data will streamline the search for useful and relevant data. To this end, the more specific we can be about the purpose of the needed data, the more efficient our search for data will be. For example, if we are interested in understanding and studying economic growth, it is useful to determine both temporal and geographic scales. In other words, for what periods (e.g., 1850–1900) and intervals (e.g., quarterly, annually) are we interested, and at what level of analysis (e.g., national, regional, state)? Frequently, data availability, or more specifically, the lack of relevant data, will force us to change the purpose or scope of our original question. A clear purpose will yield a more efficient search for data and enables us to accept or discard quickly the various data sets that we may come across.

What data already exists and is available?

Before searching for new data, it is always a good idea to take an inventory of the data that we already have. Such data may be from previous projects or analyses, or colleagues, and classmates, but the key point here is that we can save a lot of time and effort by using data that we already possess. Furthermore, by identifying what we have, we get a better understanding of what we need. For instance, though we may already have census data (i.e., attribute data), we may need updated geographic data that contains the boundaries of US states or counties.

What are the costs associated with data acquisition?

Data acquisition costs go beyond financial costs. Just as important as the financial costs to data are those that involve your time. After all, time is money. The time and energy you spend on collecting, finding, cleaning, and formatting data are time and energy taken away from data analysis. Depending on deadlines, time constraints, and deliverables, it is critical to learn how to manage your time when looking for data.

What format does the data need to be in?

Though many programs can read many formats of data, some data types can only be read by some programs and some programs that require particular data formats. Understanding what data formats you can use and those that you cannot aid in your search for data. For instance, one of the most common forms of geographic information system (GIS) data is called the shapefile. Not all GIS programs can read or use shapefiles, but it may be necessary to convert to or from a shapefile or some other format. The more data formats with which we are familiar, the better off we will be in our search for data, because we will have an understanding of not only what we can use but also what format conversions will need to be made if necessary.

All these questions are of equal importance, and being able to answer them will assist in a more efficient and effective search for data. There are several other considerations behind the search for data, and in particular, GIS data, but those listed here provide an initial pathway to a successful search for data.

As information technology evolves, and as more and more data are collected and distributed, the various forms of data that can be used with GIS increases. GIS uses and integrates two types of data: geographic data and attribute data. Sometimes the source of both geographic and attribute data is the same. For instance, the United States Census Bureau distributes geographic boundary files (e.g., census tract level, county level, state level) as well as the associated attribute data (e.g., population, race/ethnicity, income). What is more, is that such data are freely available at no charge. In many respects, US census data are exceptional: they are free and comprehensive.

Every search for data will vary according to the purpose. However, data from governments tend to have good coverage and provide a point of reference from which other data can be added, compared, and evaluated. Whether you need satellite imagery data from the National Aeronautics and Space Administration (NASA) or land use data from the United States Geological Survey (USGS), such government sources tend to be reliable, reputable, and consistent. Another key element of most government data is that they are freely accessible to the public. In other words, there is no charge to use or to acquire the data. Data that are free to use are generally called public data.

Unlike publicly available data, there are numerous sources of private or proprietary data. The main difference between public and private data is that the former tend to be free, and the latter must be acquired at a cost. Furthermore, there are often restrictions on the redistribution and dissemination of proprietary data sets (i.e., sharing the purchased data is not allowed). Again, depending on the subject matter, proprietary data may be the only option. Another reason for using proprietary data is that the data may be formatted and cleaned according to your needs. The trade-off between financial cost and time saved is one that must be seriously considered and evaluated when working with deadlines.

The search for data, and in particular, the data that you need, is often the most time-consuming aspect of any GIS-related project. Therefore, it is essential to try to define and clarify your data requirements and needs, from the temporal and geographic scales of data to the formats require, as clearly as possible and as early as possible. Such definition and clarity will pay dividends in your search for the right data, which in turn will yield better analyses and well-informed decisions.


Icon for the Creative Commons Attribution 4.0 International License

Introduction to Geographic Information Systems by R. Adam Dastrup, MA, GISP is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

Share This Book