Chemical & Biological Engineering Research: Datasets
Finding Datasets
Below is a non-exhaustive list of free and public sources for datasets.
- Data.gov, Home of the U.S. Government's Open Data
- IEEE DataPort (create a free account to access open datasets)
- Google Dataset Search
- GitHub Awesome Public Datasets
- Kaggle Datasets
- University of California, Irvine (UCI) Machine Learning Repository
- Web of Science This link opens in a new windowSelect Data Citation Index from the drop-down database menu.
Data Citation Index (1900-present)
Discover research data sets and data studies from a wide range of international data repositories in the sciences, social sciences, and arts and humanities.
Discover research data connected to articles published in journals, books, and conference proceedings.
Link directly to data repositories for easy access to the deposited data sets.
Selected National Institutes of Health Datasets & Data Repositories
- PubChem
PubChem provides information on the biological activities and properties of over 92 million small molecules. It includes substance information, compound structures, and bioactivity data in three primary databases, Substance, Compound, and BioAssay, respectively. The Substance database contains more than 223 million records; the Compound database contains more than 92 million unique structures; and the BioAssay database contains more than 1.2 million bioassays. The databases can be searched by chemical name, Chemical Abstracts Service (CAS) Registry Number, keywords, and structure. PubChem is an initiative of the National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM). For more information, see the PubChem FAQ page.
- Mouse Phenome Database
A collaborative standardized collection of measured data on laboratory mouse strains and populations. Its purpose is to characterize mouse strains and populations to facilitate translational discoveries and to assist in the selection of strains for experimental studies. Includes baseline phenotype data sets as well as studies of drug, diet, disease, and aging effects. Also includes protocols, projects and publications, and SNP, variation and gene expression studies.
- All of Us Data
The National Institutes of Health’s All of Us Research Program is building one of the largest biomedical data resources of its kind. The All of Us Research Hub stores health data from a diverse group of participants from across the United States. Registered users can use the Researcher Workbench to dive deeper into the data; conduct rapid, hypothesis-driven research; and build new methods for the future, using a variety of tools. The diverse data may help facilitate new studies that could help lead to new insights, treatments, and strategies for disease prevention that are tailored to individuals.
Note: A complete listing of NIH Data Sharing Repositories is available at: https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html.
Engineering Librarian
Erin Rowley
Contact:
119 Lockwood Library
University at Buffalo
Buffalo, NY 14260
epautler@buffalo.edu
University at Buffalo
Buffalo, NY 14260
epautler@buffalo.edu
716-645-1369