A distributed infrastructure for life-science information
The data revolution
Ten years ago, if a researcher needed to find a gene involved in a disease, he or she might have needed to invest three years of laboratory work. Today, thanks to genomic information stored in large public databases, the same task may take half an hour. Over a relatively short time, the result is a deluge of complex data that needs to be analysed.
Because of public data resources, researchers can now tackle society's serious challenges faster than ever.
By identifying patterns of genes that are active in different tumours, researchers can predict how aggressive the tumour is and decide which medicines to treat it with.
Linking catalogues that detail the millions of life forms that make up our environment enables applications ranging from the protection of endangered species and sustaining natural resources through control of agricultural pests.
An essential commodity
As data has become an essential commodity, the importance of making both the narrative and the data from publicly funded research openly available is broadly recognised. The importance of long-term stewardship is highlighted by the observation that the odds of retrieving the data from a publication decline by 17% per year. This is in sharp contrast to biomolecular data in a major public resource such as the Protein Data Bank (PDB), which has safeguarded the high-resolution structures of proteins, nucleic acids and complex assemblies since 1971. Indeed storing and, importantly, making all structural data available for broad reuse costs less than 1% of regenerating one year’s new depositions. Advanced services such as SWISS-MODEL (with over 280,000 registered users globally), are built on top of these core data resources.
ELIXIR is identifying core data resources that are essential to the larger international community and is developing a robust framework to secure their long-term sustainability.
ELIXIR is identifying a set of Core Data Resources that are globally competitive and of critical importance to the life science community and will actively promote their integration and sustainability. These Core Resources will be delivered under their own brands, as services from the Nodes, and will form the backbone of the ELIXIR data infrastructure;
ELIXIR will identify ELIXIR Named Services from the Nodes that, based on clear eligibility criteria and best practice in service delivery, are visibly branded and provide the bioinformatics user community with a toolbox of stable and well-maintained services;
ELIXIR will support Emerging Services that do not yet meet the full criteria of ELIXIR Named Services, and support these through best practice, the ELIXIR Training Programme and through the Technical Coordinator network.
Safe and secure data
ELIXIR Nodes will handle sensitive, personal data through the continued development of secure archives. Research in human subjects requires platforms with authentication services and effective governance processes for secure access and data exchange. ELIXIR will work to provide comprehensive, end-to- end solutions for data privacy that go beyond simple download protection.
Data growth curves of 5 major EMBL-EBI resources (European Genome-phenome Archive (EGA); European Nucleotide Archive (ENA); Proteomics data repository (PRIDE); Metabolomics resource (MetaboLights); and Functional genomics database (ArrayExpress) over the years 2005-2013. Source: EMBL-EBI.
The Data Platform is led by Jo McEntyre (EMBL-EBI) and Christine Durinx (SIB Swiss Institute of Bioinformatics, ELIXIR Switzerland). For enquiries about the Platform's work please email info[at]elixir-europe[dot]org.