Skip to Main Content

Chemical & Biological Engineering Research: Datasets

Last Updated: Jun 16, 2022 8:58 AM

Selected National Institutes of Health Datasets & Data Repositories

  • PubChem
    PubChem provides information on the biological activities and properties of over 92 million small molecules. It includes substance information, compound structures, and bioactivity data in three primary databases, Substance, Compound, and BioAssay, respectively. The Substance database contains more than 223 million records; the Compound database contains more than 92 million unique structures; and the BioAssay database contains more than 1.2 million bioassays. The databases can be searched by chemical name, Chemical Abstracts Service (CAS) Registry Number, keywords, and structure. PubChem is an initiative of the National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM). For more information, see the PubChem FAQ page.
  • Mouse Phenome Database
    A collaborative standardized collection of measured data on laboratory mouse strains and populations. Its purpose is to characterize mouse strains and populations in order to facilitate translational discoveries and to assist in selection of strains for experimental studies. Includes baseline phenotype data sets as well as studies of drug, diet, disease and aging effect. Also includes protocols, projects and publications, and SNP, variation and gene expression studies.

Note: A complete listing of NIH Data Sharing Repositories is available at:

Engineering Librarian

Profile Photo
Erin Rowley
119 Lockwood Library
University at Buffalo
Buffalo, NY 14260