ELIXIR Core Data Resources

ELIXIR Core Data Resources are a set of European data resources of fundamental importance to the wider life-science community and the long-term preservation of biological data.

Identification of the ELIXIR Core Data Resources involves a careful evaluation of the multiple facets of the data resources. Indicators used in the evaluation are grouped into five categories:

Scientific focus and quality of science
Community served by the resource
Quality of service
Legal and funding infrastructure, and governance
Impact and translational stories

The details of the selection criteria are described in the F1000R ELIXIR track article Identifying ELIXIR Core Data Resources. The initial Core Data Resource list was defined in July 2017 and is reviewed regularly. For an introduction to the Core Data Resources list, please view this summary.

ELIXIR is committed to Open Access as a core principle for publicly funded research. ELIXIR Core Data Resources should reflect this commitment and have terms of use or a licence that enables the reuse and remixing of data. The Creative Commons licenses CC0, CC-BY or CC-BY-SA are all conformant with the Open Definition (http://opendefinition.org/licenses/), as are equivalent open terms of use. ¹

CDR timeline of ongoing activities can be found here.

ELIXIR Core Data Resource list

Core Data Resource	Data type
ArrayExpress	Functional Genomics Data from high-throughput functional genomics experiments.
BacDive	Bacterial Diversity - information on taxonomy, morphology, physiology, and ecology of bacterial strains with genome and other molecular data. BacDive is a bacterial metadatabase that provides strain-linked information about bacterial and archaeal biodiversity. .
Bgee	Bgee is a database for retrieval and comparison of gene expression patterns across multiple animal species. It provides an intuitive answer to the question “where is a gene expressed?” and supports research in cancer and agriculture, as well as evolutionary biology.
BioImage Archive	The BioImage Archive is a public repository for biological images, supporting the deposition and reuse of reference imaging data that underpin published research across the life sciences.
BioStudies	The BioStudies database holds descriptions of biological studies, links to data from these studies in other databases at EMBL-EBI or outside, as well as data that do not fit in the structured archives at EMBL-EBI. The database can accept a wide range of types of studies described via a simple format. It also enables manuscript authors to submit supplementary information and link to it from the publication.
BRENDA	Database of enzyme and enzyme-ligand information, across all taxonomic groups, manually extracted from primary literature and extended by text mining procedures, integration of external data and prediction algorithms.
CATH	A hierarchical domain classification of protein structures in the Protein Data Bank.
Cellosaurus	A knowledge resource on cell lines. It attempts to describe all cell lines used in biomedical research.
ChEBI	Dictionary of molecular entities focused on ‘small’ chemical compounds.
ChEMBL	Database of bioactive drug-like small molecules, it contains 2-D structures, calculated properties and abstracted bioactivities.
EGA	Personally identifiable genetic and phenotypic data resulting from biomedical research projects.
EMDB	The Electron Microscopy Data Bank (EMDB) is a public repository for electron cryo-microscopy maps and tomograms of macromolecular complexes and subcellular structures. It covers a variety of techniques, including single-particle analysis, electron tomography, sub-tomogram averaging, fibre diffraction and electron crystallography.
ENA	The European Nucleotide Archive (ENA) provides a comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation.
Ensembl	Genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation.
Ensembl Genomes	Comparative analysis, data mining and visualisation for the genomes of non-vertebrate species.
Europe PMC	Europe PMC is a repository, providing access to worldwide life sciences articles, books, patents and clinical guidelines.
GWAS Catalog	The NHGRI-EBI GWAS Catalog: a curated collection of all human genome-wide association studies, produced by a collaboration between EMBL-EBI and NHGRI.
HGNC	The HGNC (HUGO Gene Nomenclature Committee) is a resource for approved human gene nomenclature containing ~42000 gene symbols and names and 1300+ gene families and sets.
Human Protein Atlas	The Human Protein Atlas (HPA) aims to map all the human proteins in cells, tissues and organs using integration of various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, transcriptomics and systems biology. This data allows exploration of the human proteome.
The IMEx Consortium: represented by IntAct and MINT	IntAct provides a freely available, open source database system and analysis tools for molecular interaction data. MINT focuses on experimentally verified protein-protein interactions mined from the scientific literature by expert curators.
InterPro	Functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites. Note: This is an umbrella resource to which many collaborating databases contribute. In naming InterPro as a Core Data Resource, the critical role of the constituent databases is recognised.
JASPAR	JASPAR is the largest open-access database of curated and non-redundant transcription factor (TF) binding profiles from six different taxonomic groups.
MGnify	The MGnify platform facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. The platform provides access to taxonomic assignments and functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, and metagenomic datasets, which are derived from a wide range of different environments.
LIPID MAPS®	LIPID MAPS is designed to be an open, systematic and standardised Lipidomics resource. Providing information on Lipids and their structures, properties and functions in biological processes.
LPSN	LPSN (List of Prokaryotic names with Standing in Nomenclature) provides authoritative information on the nomenclature of prokaryotes.
Orphadata Science	The Orphadata Science platform provides the scientific community with comprehensive, high-quality datasets related to rare diseases and orphan drugs, in a reusable and computable format.
OMA	OMA is a method and database for the inference of orthologs among complete genomes, supporting comparative genomics analyses.
OrthoDB	OrthoDB is a catalog of orthologous protein-coding genes across a wide range of species, supporting evolutionary and functional genomics studies.
PDBe	Biological macromolecular structures.
PomBase	PomBase is a comprehensive database for the fission yeast Schizosaccharomyces pombe, providing structural and functional annotation, literature curation and access to large-scale data sets.
PRIDE	Mass spectrometry-based proteomics data, including peptide and protein expression information (identifications and quantification values) and the supporting mass spectra evidence.
Reactome	REACTOME is an open-source, open access, manually curated and peer-reviewed pathway database.
Rhea	Rhea is an expert-curated knowledgebase of chemical and transport reactions of biological interest, and the standard for enzyme and transporter annotation in UniProtKB.
SILVA	SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya).
STRING	Known and predicted protein-protein interactions.
SWISS-MODEL	SWISS-MODEL is a fully automated protein structure homology-modelling server, accessible via the Expasy web server, or from the program DeepView (Swiss Pdb-Viewer). The purpose of this server is to make protein modelling accessible to all life science researchers worldwide.
UniProt	Comprehensive resource for protein sequence and annotation data.
VEuPathDB	VEuPathDB provides access to diverse genomic and other large scale datasets related to eukaryotic pathogens and invertebrate vectors of disease. Organisms supported by this resource include (but are not limited to) the US-based NIAID list of emerging and re-emerging infectious diseases.

In addition to the Core Data Resources, ELIXIR has compiled a list of recommended repositories for experimental data, the ELIXIR Deposition Databases.

Further information

ELIXIR Hub contact: Fabio Liberante at core-resources@elixir-europe.org
Papers:
- Drysdale R, Repo S, Roman Garcia P et al. Implementing a Process for the Selection of Core Data Resources [version 1; not peer reviewed]. F1000Research 2018, 7(ELIXIR):1711 (document) (https://doi.org/10.7490/f1000research.1116247.1)
- Drysdale R, McEntyre J, Durinx C and Blomberg N. The Process for the Selection of ELIXIR Core Data Resources [version 1; not peer reviewed]. F1000Research 2018, 7(ELIXIR):1712 (document) (https://doi.org/10.7490/f1000research.1116248.1)
- Drysdale R, McEntyre J, Durinx C et al. The Annual Indicator Monitoring and Periodic Review Processes: ELIXIR Core Data Resources and Deposition Databases [version 1; not peer reviewed]. F1000Research 2020, 9(ELIXIR):114 (document) (https://doi.org/10.7490/f1000research.1117816.1)

¹ Should a Core Data Resource lose its funding and the only option to remain viable requires a reversion to a non-open license, the license status and exception (e.g. requirement for industry co-funding) may be revisited by the ELIXIR Heads of Nodes Committee. Back to text.