Data

The data revolution

Ten years ago, if a researcher needed to find a gene involved in a disease, he or she might have needed to invest three years of laboratory work. Today, thanks to genomic information stored in large public databases, the same task may take half an hour. Over a relatively short time, the result is a deluge of complex data that needs to be analysed.

Because of public data resources, researchers can now tackle society's serious challenges faster than ever.

  • Large-scale DNA sequencing projects produce robust datasets that can be used to associate minute differences with susceptibility to diseases.
  • By identifying patterns of genes that are active in different tumours, researchers can predict how aggressive the tumour is and decide which medicines to treat it with.
  • Linking catalogues that detail the millions of life forms that make up our environment enables applications ranging from the protection of endangered species and sustaining natural resources through control of agricultural pests.



An essential commodity

As data has become an essential commodity, the importance of making both the narrative and the data from publicly funded research openly available is broadly recognised.

The importance of long-term stewardship is highlighted by the observation that the odds of retrieving the data from a publication decline by 17% per year. This is in sharp contrast to biomolecular data in a major public resource such as the Protein Data Bank (PDB), which has safeguarded the high-resolution structures of proteins, nucleic acids and complex assemblies since 1971.



Indeed storing and, importantly, making all structural data available for broad reuse costs less than 1% of regenerating one year’s new depositions. Advanced services such as SWISS-MODEL (with over 280,000 registered users globally), are built on top of these core data resources.



The solution



  • ELIXIR is identifying a set of Core Data Resources that are globally competitive and of critical importance to the life science community and will actively promote their integration and long-term sustainability. These Core Data Resources will be delivered under their own brands, as services from the Nodes, and will form the backbone of the ELIXIR data infrastructure.
  • Learn more about the Core Data Resources work:
    Durinx C, McEntyre J, Appel R et al. Identifying ELIXIR Core Data Resources. F1000Research 2017, 5(ELIXIR):2422 (doi: 10.12688/f1000research.9656.2)
  • ELIXIR supplies a listing of Database Services provided by the ELIXIR Nodes that, based on clear eligibility criteria and best practice in service delivery, are visibly branded and provide the bioinformatics user community with a toolbox of stable and well-maintained services.
Data growth curves of 5 major EMBL-EBI resources (European Genome-phenome Archive (EGA); European Nucleotide Archive (ENA); Proteomics data repository (PRIDE); Metabolomics resource (MetaboLights); and Functional genomics database (ArrayExpress) over the years 2005-2013. Source: EMBL-EBI. 

Leadership

Christine and Jo
Jo McEntyre (EMBL-EBI) and Christine Durinx (SIB Swiss Institute of Bioinformatics, ELIXIR Switzerland), Platform Leads
Rachel Drysdale
Rachel Drysdale
(Platform Coordinator, ELIXIR Hub)

For enquiries about the Platform's work please email rachel.drysdale[at]elixir-europe[dot]org.