List of biological databases
Biological databases are stores of biological information.[1] The journal Nucleic Acids Research regularly publishes special issues on biological databases and has a list of such databases. For instance, the 2016 issue has a list of about 180 such databases and updates to previously described databases.[2]
Meta Databases
These databases of databases collect data from different sources and make them available in a new and more convenient form, or with an emphasis on a particular disease or organism.
- BioGraph (University of Antwerp, Vlaams Instituut voor Biotechnologie) A knowledge discovery service based on the integration of more than 20 heterogeneous databases
- Bioinformatic Harvester (Karlsruhe Institute of Technology) - Integrating 26 major protein/gene resources.
- Neuroscience Information Framework (University of California, San Diego) - Integrates hundreds of neuroscience relevant resources, many are listed below.
- ConsensusPathDB - A molecular functional interaction database, integrating information from 12 other databases.
- Entrez (National Center for Biotechnology Information)
- Enzyme Portal Integrates enzyme information such as small-molecule chemistry, biochemical pathways and drug compounds. (European Bioinformatics Institute)
- euGenes (Indiana University)
- GeneCards (Weizmann Inst.)
- Genome Aggregation Database (gnomAD) (Broad Institute)
- MetaBase (KOBIC) - A user contributed database of biological databases.
- mGen containing four of the world biggest databases GenBank, Refseq, EMBL and DDBJ - easy and simple program friendly gene extraction
- MOPED (Seattle Children's Research Institute) - A multi-omics expression profiling database providing integrated proteomics and transcriptomics data from human, mouse, worm, and yeast.
- PathogenPortal A repository linking to the Bioinformatics Resource Centers (BRCs) sponsored by the National Institute of Allergy and Infectious Diseases (NIAID)
- SOURCE (Stanford University) encapsulates the genetics and molecular biology of genes from the genomes of Homo sapiens, Mus musculus, and Rattus norvegicus into easy to navigate GeneReports
- iRefIndex: provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, InnateDB, IntAct, MatrixDB, MINT, MPact, MPIDB, MPPI and OPHID.
- Pathway Commons (Memorial Sloan-Kettering Cancer Center and University of Toronto)
- Nowomics Tracks changes in several biological databases, users 'follow' genes and keywords to see a news feed of new data and papers.
- BioGPS (The Scripps Research Institute) An extensible Gene Portal System. Plugin library extends BioGPS beyond the Gene Expression Visualizer and the links to Gene Wiki to a huge number of other databases and services
- The Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups to build a comprehensive parts list of functional elements in the human genome. The corresponding data is available for download and analysis from UCSC Genome Browser.
- Human Epigenome Atlas, a collection of normal epigenomes of different tissues produced by Roadmap Epigenomics Project. Data types include histone modifications, DNA methylation, chromatin accessibility, gene expression, and small RNA expression.
- Metascape provides click-to-extract access to gene-centric function annotations complied from dozens of databases including NCBI (Entrez, OMIM, ClinVar), GO, KEGG, MSiGDB, UniProt, Protein Atlas, Ensembl, JAX, DrugBank, NHGRI-EBI, DDG2P.
- International Aging Research Portfolio (IARP) - a curated database of biomedical grants from many sources including NIH, NSF, CORDIS, Australian Research Council linked to publications.
Nucleic Acid Databases
DNA Databases
Primary Databases International Nucleotide Sequence Database (INSD) consists of the following databases.
- DNA Data Bank of Japan (National Institute of Genetics)
- EMBL (European Bioinformatics Institute)
- GenBank (National Center for Biotechnology Information)
The three databases, DDBJ (Japan), GenBank (USA) and European Nucleotide Archive (Europe), are repositories for nucleotide sequence data from all organisms. All three databases accept nucleotide sequence submissions, and then exchange new and updated data on a daily basis to achieve optimal synchronisation between them. These three databases are primary databases, as they house original sequence data. They collaborate with Sequence Read Archive (SRA), which archives raw reads from high-throughput sequencing instruments.
Secondary Databases
- SNP / Disease Databases
- OMIM Online Mendelian Inheritance in Man OMIM Inherited Diseases
- HapMap
Gene Expression Databases (mostly Microarray data)
- ArrayExpress (European Bioinformatics Institute)
- Gene Expression Omnibus (GEO, National Center for Biotechnology Information)
- GPX(Scottish Centre for Genomic Technology and Informatics)
- maxd (Univ. of Manchester)
- Stanford Microarray Database (SMD) (Stanford University)
- Genevestigator - Expression Search Engine (Nebion AG)
- Bgee Bgee is a database to retrieve and compare gene expression patterns between species. It only contains wild-type and manually curated microarray/RNASeq/in situ experiments.
- BioGPS (The Scripps Research Institute) A Gene Portal System with a Gene Expression Visualizer
- The European Genome-phenome Archive (EGA)
Genome Databases
These databases collect genome sequences, annotate and analyze them, and provide public access. Some add curation of experimental literature to improve computed annotations. These databases may hold many species genomes, or a single model organism genome.
- Bioinformatic Harvester
- Gene Disease Database
- SNPedia
- CAMERA Resource for microbial genomics and metagenomics
- Corn, the Maize Genetics and Genomics Database
- EcoCyc a database that describes the genome and the biochemical machinery of the model organism E. coli K-12
- Ensembl provides automatic annotation databases for human, mouse, other vertebrate and eukaryote genomes.
- Ensembl Genomes provides genome-scale data for bacteria, protists, fungi, plants and invertebrate metazoa, through a unified set of interactive and programmatic interfaces (using the Ensembl software platform).
- Exome Aggregation Consortium (ExAC) - exome sequencing data from a wide variety of large-scale sequencing projects. (Broad Institute)
- PATRIC, the PathoSystems Resource Integration Center
- Flybase, genome of the model organism Drosophila melanogaster
- MGI Mouse Genome (Jackson Lab.)
- JGI Genomes of the DOE-Joint Genome Institute provides databases of many eukaryote and microbial genomes.
- National Microbial Pathogen Data Resource. A manually curated database of annotated genome data for the pathogens Campylobacter, Chlamydia, Chlamydophila, Haemophilus, Listeria, Mycoplasma, Neisseria, Staphylococcus, Streptococcus, Treponema, Ureaplasma, and Vibrio.
- RegulonDB RegulonDB is a model of the complex regulation of transcription initiation or regulatory network of the cell E. coli K-12.
- Repbase Repbase is the most commonly used database for repetitive elements (transposons).
- Saccharomyces Genome Database, genome of the yeast model organism.
- Viral Bioinformatics Resource Center Curated database containing annotated genome data for eleven virus families.
- The SEED platform for microbial genome analysis includes all complete microbial genomes, and most partial genomes. The platform is used to annotate microbial genomes using subsystems.
- Xenbase, genome of the model organism Xenopus tropicalis and Xenopus laevis
- Wormbase, genome of the model organism Caenorhabditis elegans and WormBase ParaSite for parasitic species
- Zebrafish Information Network, genome of this fish model organism.
- TAIR, The Arabidopsis Information Resource.
- UCSC Malaria Genome Browser, genome of malaria causing species (Plasmodium falciparum and others)
- RGD Rat Genome Database: Genomic and phenotype data for Rattus norvegicus
- INTEGRALL: Database dedicated to integrons, bacterial genetic elements involved in the antibiotic resistance
- Fourmidable ant genome database provides ant genome blast search and sequence download.
- VectorBase The NIAID Bioinformatics Resource Center for Invertebrate Vectors of Human Pathogens
- EzGenome, comprehensive information about manually curated genome projects of prokaryotes (archaea and bacteria)[3]
- Banana Genome Hub, The Banana Genome database.
- GeneDB for Apicomplexan Protozoa, Kinetoplastid Protozoa, Parasitic Helminths, Parasite Vectors + several bacteria and viruses
- EuPathDB Eukaryotic pathogen database resources includes amoeba, fungi, plamodium, trypanosomatids etc.
- SNiPhunter SNP search engine: search for SNPs in Pubmed open access literature using SNP IDs.
- The 1000 Genomes Project was launched in January 2008. The genomes of more than a thousand anonymous participants from a number of different ethnic groups were analyzed and made publicly available.
- Personal Genome Project: human genomes
- Legume Information System (LIS): genomic database for the legume family.[4]
- PeanutBase: genetic and genomic data to enable more rapid crop improvement in peanut.[5]
- Legume Federation: A consortium of scientists working to support robust agriculture for a substantially legume-fed world.[4]
Phenotype Databases
- PhenCode linking human mutations with phenotype
- PhenomicDB multi-organism database linking genotype to phenotype
- PHI-base Pathogen-host interaction database. It links gene information to phenotypic information from microbial pathogens on their hosts. Information is manually curated from peer reviewed literature.
- RGD Rat Genome Database: Genomic and phenotype data for Rattus norvegicus
- Planform: planarian formalized-experiments database, linking surgical, genetic, and pharmacological perturbations to morphological phenotypic outcomes from published planarian regeneration experiments.
- Limbform: limb formalized-experiments database, linking surgical, genetic, and pharmacological perturbations to morphological phenotypic outcomes from published multi-organism limb regeneration experiments.
- Ontology of microbial phenotypes
RNA Databases
- C-It-Loci – A database of RNA expression and conserved loci for studying lncRNAs across species.
- LncRNAWiki , a wiki-based database for community curation of known human long non-coding RNAs
- Rfam , a database of RNA families
- miRBase , the microRNA database
- snoRNAdb, a database of snoRNAs
- lncRNAdb, a database of lncRNAs
- DASHR The DAtabase of Small Human non-coding RNAs: integrated annotation and sequencing-based expression data for all major classes of human small non-coding RNAs (sncRNAs) for both full sncRNA transcripts and mature sncRNA products derived from these larger RNAs.
- MONOCLdb The MOuse NOnCode Lung database: Annotations and expression profiles of mouse long non-coding RNAs (lncRNAs) involved in Influenza and SARS-CoV infections.
- piRNAbank, a database of piRNAs
- GtRNAdb, a database of genomic tRNAs
- MINTbase, a framework for the interactive exploration of mitochondrial and nuclear tRNA fragments
- SILVA, a database of ribosomal RNAs
- RDP, the Ribosomal Database Project
- tmRDB, a database of tmRNAs
- SRPDB, a database of signal recognition particle RNAs
- yeast snoRNA database
- Sno/scaRNAbase, a database of snoRNA and scaRNAs
- snoRNA-LBME-db, a snoRNA database
Amino Acid / Protein Databases
Protein Sequence Databases
- UniProt Universal Pesource (EBI, Swiss Institute of Bioinformatics, PIR)
- Protein Information Resource (Georgetown University Medical Center (GUMC))
- Swiss-Prot Protein Knowledgebase (Swiss Institute of Bioinformatics)
- PEDANT Protein Extraction, Description and ANalysis Tool (Forschungszentrum f. Umwelt & Gesundheit)
- PROSITE Database of Protein Families and Domains
- Database of Interacting Proteins (Univ. of California)
- Pfam Protein families database of alignments and HMMs (Sanger Institute)
- PRINTS a compendium of protein fingerprints from (Manchester University)
- ProDom Comprehensive set of Protein Domain Families (INRA/CNRS)
- SignalP 3.0 Server for signal peptide prediction (including cleavage site prediction), based on artificial neural networks and HMMs
- SUPERFAMILY Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms
- neXtProt - a human protein centric knowledge resource
- Annotation Clearing House a project from the National Microbial Pathogen Data Resource
- InterPro Classifies proteins into families and predicts the presence of domains and sites.
- ProteomeScout - Includes a graphics exports of protein annotations including domains, secondary structure, and post-translational modifications
Protein Structure Databases
Primary databases
- Protein Data Bank (PDB) comprising:
- Protein DataBank in Europe (PDBe)
- ProteinDatabank in Japan (PDBj)
- Research Collaboratory for Structural Bioinformatics (RCSB)
Secondary databases
- SCOP Structural Classification of Proteins
- CATH Protein Structure Classification
- PDBsum
For more protein structure databases, see also Protein structure database
Protein Model Databases
- Swiss-model Server and Repository for Protein Structure Models
- ModBase Database of Comparative Protein Structure Models (Sali Lab, UCSF)
- Protein Model Portal (PMP) Meta database that combines several databases of protein structure models (Biozentrum, Basel, Switzerland)
- Similarity Matrix of Proteins (SIMAP) is a database of protein similarities computed using FASTA.
Protein-Protein and Other Molecular Interactions
- BIND Biomolecular Interaction Network Database
- BioGRID A General Repository for Interaction Datasets (Samuel Lunenfeld Research Institute)
- CCSB Interactome
- DIP Database of Interacting Proteins
- IntAct molecular interaction database: a central, standards-compliant repository of molecular interactions, including protein–protein, protein–small molecule and protein–nucleic acid interactions.
- NetPro
- STRING: STRING is a database of known and predicted protein-protein interactions. (EMBL)
- The Cell Collective
- MINT: Molecular INTeraction database
- iRefIndex: provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, InnateDB, IntAct, MatrixDB, MINT, MPact, MPIDB, MPPI and OPHID.
- RNA-binding protein database
- BioLiP: Protein-ligand binding database
- IID - Integrated Interactions Database
Proteomics Databases
- Proteomics Identifications Database (PRIDE) A public repository for proteomics data, containing protein and peptide identifications and their associated supporting evidence as well as details of post-translational modifications. (European Bioinformatics Institute)
- ProteomeScout - A public repository of processed proteomics datasets concerning post-translational modifications, includes quantification across conditions (if applicable). Also includes a graphics exports of protein annotations.
- MitoMiner - A mitochondrial proteomics database integrating large-scale experimental datasets from mass spectrometry and GFP studies for 12 species. (MRC Mitochondrial Biology Unit)
- GelMap - A public database of proteins identified on 2D gels (University of Hanover Proteomics Department)
- OWL - A public non-redundant database for protein search, derived from : SWISS PROT, PIR, GenBank(translation) and NRL-3D
- ProteomeXchange provides a coordinated submission of mass spectrometry proteomics data to the main existing proteomics repositories. It includes datasets such as PRIDE, Tranche, and PeptideAtlas.
Additional Databases
Carbohydrate Structure Databases
- EuroCarbDB, A repository for both carbohydrate sequences/structures and experimental data.
Signal Transduction Pathway Databases
- Cancer Cell Map
- Netpath - A curated resource of signal transduction pathways in humans
- NCI-Nature Pathway Interaction Database
- Reactome - Navigable map of human biological pathways, ranging from metabolic processes to hormonal signalling.
- SignaLink Database
- WikiPathways
- The Cell Collective
- Literature-curated human signaling network, the largest human signaling network database
Metabolic Pathway and Protein Function Databases
- BioCyc Database Collection including EcoCyc and MetaCyc
- BRENDA The Comprehensive Enzyme Information System, including FRENDA, AMENDA, DRENDA, and KENDA,
- KEGG PATHWAY Database (Univ. of Kyoto)
- MANET database (University of Illinois)
- MetaboLights Metabolomics experiments and derived information: metabolite structures, reference spectra, biological roles, locations and concentrations. (European Bioinformatics Institute)
- MetaNetX Automated Model Construction and Genome Annotation for Large-Scale Metabolic Networks
- Reactome Navigable map of human biological pathways, ranging from metabolic processes to hormonal signalling. (Cold Spring Harbor Laboratory, European Bioinformatics Institute, Gene Ontology Consortium)
- Small Molecule Pathway Database (SMPDB)
- WikiPathways
Metabolomic Databases
- NIH Common Funds Metabolomics Database
- MetaboLights
- Human Metabolome Database (HMDB)
- Yeast Metabolome Database (YMDB)
- E. coli Metabolome Database (ECMDB)
- DrugBank
- ChEBI
- BioMagResBank
- Golm Metabolome Database
- MassBank
Exosomal Databases
Mathematical Model Databases
- Biomodels Database: published mathematical models describing biological processes.
- CellML
- The Cell Collective: build and simulate large-scale models in real-time and in a highly collaborative fashion
PCR and Quantitative PCR Primer Databases
- PathoOligoDB: A free QPCR oligo database for pathogens
- RTPrimerDB - a public primers and probes database for real-time PCR reactions
Taxonomic Databases
- Catalogue of Life source databases (a list of taxonomic databases that contribute to the Catalogue of Life)
- Encyclopedia of Life
- Integrated Taxonomic Information System
- EzTaxon-e, database for the identification of prokaryotes based on 16S ribosomal RNA gene sequences
- BacDive is a bacterial metadatabase that provides strain-linked information about bacterial and archaeal biodiversity, including taxonomy information.
Radiologic Databases
- The Cancer Imaging Archive (TCIA)
- Neuroimaging Informatics Tools and Resources Clearinghouse
- XNAT Central
Specialized Databases (Alphabetically Ordered)
- Antibody Central Antibody information database and search resource.
- AntibodyRegistry.org assigns unique identifiers used to track antibody reagents in published literature.
- Bgee Bgee is a database to retrieve and compare gene expression patterns between species.
- BIOMOVIE (ETH Zurich) movies related to biology and biotechnology
- BioNumbers a database of useful biological numbers
- Barcode of Life Data Systems, a database of DNA barcodes
- Cellosaurus, a knowledge resource on cell lines
- CGAP Cancer Genes (National Cancer Institute)
- Clone Registry Clone Collections (National Center for Biotechnology Information)
- Colorectal Cancer Atlas catalogs multiple genomic and proteomic data types from 13,711 tissue samples to identify sequence variants in more than 165 colorectal cancer cell lines.
- Connectivity map Transcriptional expression data and correlation tools for drugs
- CTD The Comparative Toxicogenomics Database describes chemical-gene-disease interactions
- DBGET H.sapiens (Univ. of Kyoto)
- DisGeNET DisGeNET is database that integrates information on gene-disease associations
- DiProDB A database to collect and analyse thermodynamic, structural and other dinucleotide properties.
- Drug2Gene Provides integrated information for identified and reported relations between genes/proteins and drugs/compounds
- Dryad a repository of data underlying scientific publications in the basic and applied biosciences.
- Edinburgh Mouse Atlas
- EPD Eukaryotic Promoter Database
- Eukaryotic Linear Motif Database (ELM) Database of short linear motifs.
- EpimiRBase A comprehensive database of microRNA-epilepsy associations.
- FunSecKB The fungal secretome knowledgebase.
- FunSecKB2 The fungal secretome and subcellular proteome knowledgebase (version 2)
- GreenPhylDB (A phylogenomic database for plant comparative genomics)
- GDB Hum. Genome Db (Human Genome Organisation)
- HGMD disease-causing mutations (HGMD Human Gene Mutation Database)
- HUGO (Official Human Genome Database: HUGO Gene Nomenclature Committee)
- HvrBase++ Human and primate mitochondrial DNA
- IEDB Immune Epitope Database
- IMGT The international ImMunoGeneTics information system
- INTERFEROME The Database of Interferon Regulated Genes
- List with SNP-Databases
- MetazSecKB The metazoa [human/animal] secretome and subcellular proteome knowledgebase
- MethBase Database of DNA methylation data visualized on the UCSC Genome Browser.
- Minimotif Miner -Database of short contiguous functional peptide motifs
- NCBI-UniGene (National Center for Biotechnology Information)
- Oncogenomic databases A compilation of databases that serve for cancer research.
- OMIM Inherited Diseases (Online Mendelian Inheritance in Man)
- OrthoMaM (A database of Orthologous Mammalian Markers)
- OrthoMCL Ortholog Groups of Protein Sequences from Multiple Genomes including Archaea, Bacteria and Eukaryotes.
- p53 The p53 Knowledgebase
- PASD The plant alternative splicing database
- PlantSecKB The plant secretome and subcullular proteome knowledgebase
- Plasma Proteome Database Human plasma proteins along with their isoforms
- SABIO-RK SABIO-RK is a curated database that contains information about biochemical reactions, their kinetic rate equations with parameters and experimental conditions.
- SciClyc An Open-access database to shared antibodies, cell cultures, and documents for biomedical research.
- Selectome Selectome is a database of positive selection based on a rigorous branch-site specific likelihood test. Positive selection is detected using CODEML on all branches of animal gene trees.
- SHMPD The Singapore Human Mutation and Polymorphism Database
- SNPSTR database A database of SNPSTRs - compound genetic markers consisting of a microsatellite (STR) and one tightly linked SNP - in human, mouse, rat, dog and chicken.
- The Cancer Genome Atlas (TCGA) provides data from hundreds of cancer samples obtained using high-throughput techniques such as gene expression profiling, copy number variation profiling, SNP genotyping, genome wide DNA methylation profiling, microRNA profiling, and exon sequencing of at least 1,200 genes.
- TDR Targets A chemogenomics database focused on drug discovery in tropical diseases.
- TRANSFAC A database about eukaryotic transcription factors, their genomic binding sites and DNA-binding profiles.
- TreeBASE An open-access database of phylogenetic trees and the data behind them
- Treefam TreeFam (Tree families database) is a database of phylogenetic trees of animal genes
- [XTractor] Discovering Newer Scientific Relations Across PubMed Abstracts. A tool to obtain manually annotated relationships for Proteins, Diseases, Drugs and Biological Processes as they get published in PubMed.
Wiki-Style Databases
- CHDwiki
- EcoliWiki
- Gene Wiki
- GyDB
- NeuroLex
- OpenWetWare
- PDBWiki
- Proteopedia
- RiceWiki
- LncRNAWiki
- Topsan
- WikiGenes
- WikiPathways
- WikiProfessional
- YTPdb
Unsorted
- PubMed (references and abstracts on life sciences and biomedical topics)
- FINDbase (the Frequency of INherited Disorders database)
- RIKEN integrated database of mammals
References
- ↑ Wren JD, Bateman A (2008). "Databases, data tombs and dust in the wind.". Bioinformatics. 24 (19): 2127–8. doi:10.1093/bioinformatics/btn464. PMID 18819940.
- ↑ "Nucleic Acids Research Database issue 2016". Nucleic Acids Research. Oxford University Press. Retrieved 26 Oct 2016.
- ↑ http://ezgenome.ezbiocloud.net/
- 1 2 Dash, Sudhansu; Campbell, Jacqueline D.; Cannon, Ethalinda K. S.; Cleary, Alan M.; Huang, Wei; Kalberer, Scott R.; Karingula, Vijay; Rice, Alex G.; Singh, Jugpreet (2016-01-04). "Legume information system (LegumeInfo.org): a key component of a set of federated data resources for the legume family". Nucleic Acids Research. 44 (D1): D1181–D1188. doi:10.1093/nar/gkv1159. ISSN 0305-1048. PMC 4702835. PMID 26546515.
- ↑ Dash, Sudhansu; Cannon, Ethalinda K. S.; Kalberer, Scott R.; Farmer, Andrew D.; Cannon, Steven B. (2016-01-01). Wilson, Richard F., ed. Chapter 8 - PeanutBase and Other Bioinformatic Resources for Peanut A2 - Stalker, H. Thomas. AOCS Press. pp. 241–252. doi:10.1016/b978-1-63067-038-2.00008-3. ISBN 9781630670382.