Introduction to Finding Data

Be specific about your topic so you can narrow your search, but be flexible enough to tailor your needs to existing sources.

Identify the Unit of Analysis

This is what you should be able to define:

#1 - Who or What?

Social Unit: This is the population that you want to study.
It can be:

  • People
    For example: individuals, couples, households
  • Organizations and Institutions
    For example: companies, political parties, nation states
  • Commodities and Things
    For example: crops, automobiles, arrests

#2 - When?

Time: This is the period of time you want to study.
Things to think about:

  • Point in time
    A "snapshot" or one-time study
  • Time Series
    Study changes over time
  • Current information
    Keep in mind that there is usually a time lag before data will be published.  The most current information available may be a couple years old.
  • Historical information

 #3 - Where?

Space: Geography or place.
There are two main types of geographic classifications:

  • Political boundaries
    For example: nation, state, county, school district, etc.
  • Statistical/census geography
    For example: metropolitian statistical areas, tracts, block groups, etc.

Remember to define your topic with enough flexibility to adapt to available data!
Data is not available for every thinkable topic. Some data is hidden (behind a pay-wall for example), uncollected, unavailable. Be prepared to try alternative data.

Search Strategies

Search Strategy #1: Search in a Data Repository

Look within a data repository that collects datasets within the general subject area that you are searching for.

Check out the other tabs in this guide for more disciplinary repository examples.


Search Strategy #2: Identify Potential Producers

Ask yourself: Who might collect and publish this type of data?

Then visit the organization’s website and see if you're right! Or, search for them as an author in the library catalog.

These are some of the main types of data producers:

Government Agencies

The government collects data to aid in policy decisions and is the largest producer of data overall. For example, the U.S. Census Bureau, Federal Election Commission, Federal Highway Administration and many other agencies collect and publish data. To better understand the structure of government agencies read the U.S. Government Manual and browse FedStats. United States government data is free and publicly available, but may require access through library resources or special requests.

Non-Government Organizations

Many independent non-commercial and nonprofit organizations collect and publish data that supports their social platform. For example, the International Monetary Fund, United Nations, World Health Organization, and many others collect and publish data. Data from NGOs may be free or fee-based. 

Academic Institutions

Academic research projects funded by public and private foundations create a wealth of data. For example, the Michigan State of the State Survey, Panel Study of Income Dynamics, American National Election Studies, and many other research projects collect and publish data. Much of this type of data is free and publicly available, but may require access through library resources. Access to smaller original research projects may be dependent upon contacting individual researchers.

Private Sector

Commercial firms collect and publish data as a paid service to clients or to sell broadly. Examples include marketing firms, pollsters, trade organizations, and business information. This information is almost always fee-based and may not always be available for public release.

 


Search Strategy #3: Turn to the literature

Search for research studies based on secondary analysis of publicly available data sets.

Unfortunately, citation of research data is often incomplete.  Sometimes the best you will get is the title of the data set used, but check to see if the data or a related publication are cited and follow it up.  Don't commit this fallacy when you publish, cite your data.

Data Archive Bibliographies

  • ICPSR Bibliography of Data-Related Literature
    "A continuously-updated database of thousands of citations of works using data held in the ICPSR archive. The works include journal articles, books, book chapters, government and agency reports, working papers, dissertations, conference papers, meeting presentations, unpublished manuscripts, magazine and newspaper articles, and audiovisual materials."

Library Databases

  • Databases A-Z
    Search the literature from your field.  Try related disciplines as well.

Library Catalog

  • Use the library catalog as part of your literature review to find books on your topic that may cite relevant data providers or for books of statistical tables to identify sources of data. Try adding keywords such as “data” or “statistics” to your search.

Books on Research Methods


Search Strategy #4: Ask for help

Knowing when to call in reinforcements is important.

Ask a Librarian

Access to Datasets

Depending on which search strategy you used, you may have already found the dataset file download link directly on a website.  Or, you may have just a reference/citation to a dataset or producer.  Here are some common ways to find the dataset files themselves.

  • Government agencies and universities will often post dataset files directly on their websites. 
  • Check to see if the dataset has been archived in ICPSR or another topical data repository. 
  • Contact the data producer directly.
  • Ask a Librarian for assistance.

Evaluate Data

Once you’ve chosen a data set that you believe will work, take care to carefully evaluate it. Is it appropriate? Does it come from an authoritative source? Does it fit your needs? Does it cover your Where, When, and Who or What requirements? Are you willing to compromise your requirements or manipulate the data to fit your needs? Always read the documentation and codebook to ensure that the analysis you are planning to do really measures what you want it to.