Key element in long term and consistent data citation and identification
EMBL-EBI has established Identifiers.org as a stable system for identification and citation of life science data, using Persistent Identifiers (PIDs). Identifiers.org not only enables researchers to easily reference their data, but also provides a variety of useful, supporting web services.
Identifiers.org is built upon a high-quality curated registry containing several hundred life science data collections. Following a recently completed ELIXIR Implementation Study, the registry now includes all relevant databases operated by individual ELIXIR Nodes.
Defending against dead links
Having a dedicated service to check all reference hyperlinks and keep them up-to-date has many advantages. Dr Sarala Wimalaratne, Identifiers.org Project Lead at EMBL-EBI explains: “From time to time, hyperlinks to life science data records may change, for example due to technical updates or institutional changes. Identifiers.org keeps track of all those changes and provides stable identification of life science data through the latest URLs used to access the data.”
This is very useful to researchers, as they can be sure that their stored reference links will always point to the right data source. But it also helps developers of bioinformatics tools and database providers, both of whom need to maintain up-to-date cross-references. In interconnected networks of such cross-referenced systems, a single broken link can compromise the whole network; identifiers.org effectively helps avoid ‘dead-ends’ in networks of linked data.
Nick Juty, from Manchester University (ELIXIR UK), says: “All entries in Identifiers.org are carefully curated to a high standard, collating all the necessary information to unambiguously and accurately identify individual data records. This is a continuous process, requiring the addition of new resources, whilst maintaining and updating existing records.”
“The ELIXIR Implementation Study helped integrate resources in ELIXIR Nodes into Identifiers.org. Researchers as well as scientific journals can now use a consistent citation scheme for any resource within the ELIXIR ecosystem” says Jerry Lanfear, ELIXIR Chief Technical Officer. “This will make collaboration and linking of ELIXIR resources much easier. It also helps establish standards in data identification across the life science community,” adds Lanfear.
Global Standards in Data Citation
Another goal of the ELIXIR Implementation Study was to improve and harmonise existing data citation practices in scientific literature and on the web at large. In a collaboration with the N2T.net team at the California Digital Library (CDL), the Identifiers.org team developed a global approach for the formal citation of research data.
The citation system is based on compact identifiers - an easy to read and easy to process citation system using a unique prefix to indicate an individual archive, combined with a locally assigned identifier (e. g. uniprot:P04150). This compact identifier points to identical records through either EMBL-EBI or CDL’s resolving systems. For this system to work globally, EMBL-EBI and CDL established a namespace registry with an easy to use form for requesting new prefixes, and clear governance and maintenance rules to resolve all references to the right data collections.
This new approach was developed by an international team organized through FORCE11.org, and has been presented in a recent paper in Nature-Scientific Data, with lead authorship by EMBL-EBI and CDL staff. It will be beneficial not only to authors, but also to scientific journals and other publishers in the life sciences.
Nature Scientific Data announced today that it will be ”taking advantage of the resolver services offered by identifiers.org and N2T.net to provide more standardized and predictable links for biomedical datasets that have accession identifiers”
Sarala M. Wimalaratne et al. Uniform resolution of compact identifiers for biomedical data. Sci. Data 5:180029 doi: 10.1038/sdata.2018.29 (2018).
The ELIXIR Implementation Study on Data Identification and Interoperability (September 2016- December 2017) was funded by the ELIXIR Hub and carried out by EMBL-EBI.
This harmonization work was carried out by an international team of experts, organized under the auspices of a FORCE11 working group, and led by Maryann Martone (UCSD), Tim Clark (University of Virginia), and Henning Hermjakob (EMBL-EBI), with funding from the U.S. National Institutes of Health.
Special thanks to Rafael Jimenez (ELIXIR Hub), Nick Juty (ELIXIR UK, University of Manchester), and John Kunze (California Digital Library) and FORCE11 Identifiers Group for their advice and insights throughout the project.
Following the successful ELIXIR Implementation Study, Identifiers.org secured funding as part of the EU Horizon 2020 FREYA project to establish persistent identifiers in European and global research infrastructures in all scientific domains. Further funding has also been made available from the US, as part of NIH Data Commons Pilot project Towards a FAIR Digital Ecosystem in the Cloud.