Introduction to Finding Data
Be specific about your topic so you can narrow your search, but be flexible enough to tailor your needs to existing sources.
Identify the Unit of Analysis
This is what you should be able to define:
#1 - Who or What?
Social Unit: This is the population that you want to study.
It can be:
For example: individuals, couples, households
- Organizations and Institutions
For example: companies, political parties, nation states
- Commodities and Things
For example: crops, automobiles, arrests
#2 - When?
Time: This is the period of time you want to study.
Things to think about:
- Point in time
A "snapshot" or one-time study
- Time Series
Study changes over time
- Current information
Keep in mind that there is usually a time lag before data will be published. The most current information available may be a couple years old.
- Historical information
#3 - Where?
Space: Geography or place.
There are two main types of geographic classifications:
- Political boundaries
For example: nation, state, county, school district, etc.
- Statistical/census geography
For example: metropolitian statistical areas, tracts, block groups, etc.
Remember to define your topic with enough flexibility to adapt to available data!
Data is not available for every thinkable topic. Some data is hidden (behind a pay-wall for example), uncollected, unavailable. Be prepared to try alternative data.
Search Strategy #1: Search in a Data Repository
Look within a data repository that collects datasets within the general subject area that you are searching for.
- Inter-University Consortium for Political and Social Research (ICPSR)
The world's largest social science data archive. It is one of the best places to start looking for a data set related to the social sciences.
- Data Repositories (Open Access Directory)
A list of open data repositories from multiple academic disciplines.
- re3data.org: Registry of Research Data Repositories
re3data.org is a global registry of research data repositories from different academic disciplines. Search by discipline or broad topic to find repositories relevant to your data need.
Check out the other tabs in this guide for more disciplinary repository examples.
Search Strategy #2: Identify Potential Producers
Ask yourself: Who might collect and publish this type of data?
Then visit the organization’s website and see if you're right! Or, search for them as an author in the library catalog.
These are some of the main types of data producers:
The government collects data to aid in policy decisions and is the largest producer of data overall. For example, the U.S. Census Bureau, Federal Election Commission, Federal Highway Administration and many other agencies collect and publish data. To better understand the structure of government agencies read the U.S. Government Manual and browse FedStats. United States government data is free and publicly available, but may require access through library resources or special requests.
Many independent non-commercial and nonprofit organizations collect and publish data that supports their social platform. For example, the International Monetary Fund, United Nations, World Health Organization, and many others collect and publish data. Data from NGOs may be free or fee-based.
Academic research projects funded by public and private foundations create a wealth of data. For example, the Michigan State of the State Survey, Panel Study of Income Dynamics, American National Election Studies, and many other research projects collect and publish data. Much of this type of data is free and publicly available, but may require access through library resources. Access to smaller original research projects may be dependent upon contacting individual researchers.
Commercial firms collect and publish data as a paid service to clients or to sell broadly. Examples include marketing firms, pollsters, trade organizations, and business information. This information is almost always fee-based and may not always be available for public release.
Search Strategy #3: Turn to the literature
Search for research studies based on secondary analysis of publicly available data sets.
Unfortunately, citation of research data is often incomplete. Sometimes the best you will get is the title of the data set used, but check to see if the data or a related publication are cited and follow it up. Don't commit this fallacy when you publish, cite your data.
Data Archive Bibliographies
- ICPSR Bibliography of Data-Related Literature
"A continuously-updated database of thousands of citations of works using data held in the ICPSR archive. The works include journal articles, books, book chapters, government and agency reports, working papers, dissertations, conference papers, meeting presentations, unpublished manuscripts, magazine and newspaper articles, and audiovisual materials."
- Databases A-Z
Search the literature from your field. Try related disciplines as well.
- Use the library catalog as part of your literature review to find books on your topic that may cite relevant data providers or for books of statistical tables to identify sources of data. Try adding keywords such as “data” or “statistics” to your search.
Books on Research Methods
- There are a number of books written about conducting secondary analyses of published data sets. These books will often list relevant studies. General books on research methods within your discipline may also include a chapter that talks about secondary analysis and data sources. Here are a few selected titles:
- Secondary data analysis (Pocket guides to social work research methods) / Thomas P. Vartanian
- Secondary data analysis: An introduction for psychologists / edited by Kali H. Trzesniewski, M. Brent Donnellan, and Richard E. Lucas
- Secondary data sources for public health: A practical guide / Sarah Boslaugh
- Using secondary data in educational and social research / Emma Smith
- Reworking qualitative data / Janet Heaton
Search Strategy #4: Ask for help
Knowing when to call in reinforcements is important.
Access to Datasets
Depending on which search strategy you used, you may have already found the dataset file download link directly on a website. Or, you may have just a reference/citation to a dataset or producer. Here are some common ways to find the dataset files themselves.
Once you’ve chosen a data set that you believe will work, take care to carefully evaluate it. Is it appropriate? Does it come from an authoritative source? Does it fit your needs? Does it cover your Where, When, and Who or What requirements? Are you willing to compromise your requirements or manipulate the data to fit your needs? Always read the documentation and codebook to ensure that the analysis you are planning to do really measures what you want it to.