Database citations in patents and articles indicate impact of biological data on innovation

The extensive reuse of data from bioinformatics resources in research articles and patents demonstrate long-term value of biological data in life science research and biotechnology industry. This is the conclusion of a recent paper published by the ELIXIR Hub and EMBL-EBI, which examined patterns of database citations in research articles and patents. It found more than 8,000 patents in 2014 referring to ten different data repositories.

The article Patterns of database citation in articles and patents indicate long-term scientific and industry value of biological data resources, published in ELIXIR Reports, the ELIXIR Channel on F1000Research platform, investigated the citations of molecular data in the European Nucleotide Archive (ENA), the Protein Data Bank in Europe (PDBe) and other repositories.

Cross-referencing between the open access literature repository Europe PMC and the ENA and PDB archives, the authors separated data citations arising from depositions from those indicating their subsequent reuse by the scientific community. The figures reveal that the number of data citations remains high and constant long after the initial deposition (the average annual number of citations for each deposition article in PDB is 6.7). Using SureChEMBL, EMBL-EBI’s open chemistry patent database, the analysis was then extended to include citations of biomolecular resources in patents.

"This is the first time that the usage and citations of bioinformatics data resources in the patent literature have been analysed and quantified. We identified citations of bioinformatics resources in over 8,000 patents from 2014, used as a reference to define biological concepts in the patents", says Niklas Blomberg, ELIXIR Director and one of the authors of the study. "This shows the data resources provide an important framework for new discoveries - a real, tangible value for industry and innovation."

The extensive reuse of data from biomolecular data resources in academia and industry proves the critical role that is played by bioinformatics databases in life science research. It also highlights the need for robust metrics for data usage.

Johanna McEntyre, Head of Literature Services at EMBL-EBI and one of the authors of the paper says: "Our results also show that using simple metrics such as citation counts in biomedical literature, doesn't alone yield a complete picture of the use and value of bioinformatics data repositories. Besides academic papers, the evaluation must take into account the secondary use of data in patents and other technical documents, such as clinical guidelines, standards, and grant applications. It must also consider the extensive data reuse that occur in value-added data resources and community services such as model-organism databases "

ELIXIR – through a Work Package on the ELIXIR-EXCELERATE grant – is leading the work to develop a coherent framework for assessing the scientific and societal value of bioinformatics resources. Understanding how to establish indicators of data citations across life sciences is an important part of this effort.


Bousfield D, McEntyre J, Velankar S et al. Patterns of database citation in articles and patents indicate long-term scientific and industry value of biological data resources [version 1; referees: 2 approved]. F1000Research 2016, 5(ELIXIR): 160 (doi: 10.12688/f1000research.7911.1)

Thu 14 April 2016