Research Guides: Biochemistry: Chemical & Genomic Data

Chemical Data Sources

NIST Chemistry WebBook
Search by formula, name, CAS RN, etc. Data available includes IR spectrum, gas phase chromatography, and other data resources.
Organic Syntheses
Online version of the Org Syn series; searchable via keyword or structure.
PubChem
PubChem provides information on the biological activities of small molecules. PubChem is organized as three linked databases - PubChem Substance, PubChem Compound, and PubChem BioAssay. PubChem also provides a fast chemical structure similarity search tool.
Sigma Aldrich Fine Chemicals
Searchable database of chemicals; contains compound and msds info, as well as other basic information.

Genome Data Sources

There are multiple informatics databases from the National Center for Biotechnology Information (NCBI) accessible via the PubMed website. Some resources of interest include:

BLAST (Basic Local Alignment Search Tool)
BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.
Gene
Contains names, reference sequences, and connected data on expression, structure, and function of genes from thousands of species ranging from viruses to bacteria to eukaryotes.
Genome
Contains sequence and map data from the whole genomes of over 1000 species or strains spanning bacteria, archaea, eukaryota, viruses, phages, plasmids, and organelles. The genomes represent both completely sequenced genomes and those with sequencing in-progress.
Nucleotide
Contains nucleotide sequence data from several sources (including GenBank, EMBL, RefSeqs, and PDB)
Protein
Contains amino acid sequences created from translating coding regions described in nucleotide records in multiple databases, including GenBank, EMBL, and DDBJ, as well as protein-specific resources including UniProt, Protein Research Foundation (PRF), and the Protein Data Bank (PDB).
Structure
Contains experimental data from crystallographic and NMR structure determinations deposited in the Protein Data Bank (PDB).
Taxonomy
The Taxonomy Database is a curated classification and nomenclature for all of the organisms in the public sequence databases. This currently represents about 10% of the described species of life on the planet. If you are looking for a sequence from a particular species, finding the record in Taxonomy will link to the relevant sequences in the Nucleotide, Protein, and Gene databases.