DNA and RNA sequencing have become increasingly important in medical and translational research. The data generated from these techniques has led to a huge demand for secure means to store, transfer and analyse the human biomedical data that has been consented for research.
The Use Case takes the European Genome-phenome Archive (EGA) as its primary data source, access to which is controlled. The EGA allows an authorised user to search sequenced material, patient samples stored in biobanks, and the metadata around patients (their illnesses, treatments, outcomes). It also queries national search engines on behalf of the users. Datasets can then be downloaded into an EGA compatible cloud or cluster local to the researcher.
The Human Data Use Case extends and generalises the system of access authorisation and secure data transfer developed in the EGA. It aims to provide a framework for the secure submission, archiving, dissemination and analysis of human biomedical data across Europe.
What the Use Case does
Provides a sustainable infrastructure for storing, coordinating and distributing human data
- The infrastructure is based on the European Genome-phenome Archive (EGA), tranSMART and Galaxy. Researchers will use the EGA to store their raw data, tranSMART to collate different data sets for preliminary analysis, and a Galaxy cloud service for further analysis.
- Local-EGA: The Use Case is also developing a portable submission toolkit (Local-EGA). This will allow you to deposit sensitive human data locally (and comply with national guidelines for storing that data) but enable data reuse across national boundaries. If you are part of an ELIXIR Node, you can set up a local instance of the EGA with metadata from the main EGA. This will allow people to search both your local and the main EGA at once.
- Submission REST API: the Use Case is developing an API that you can use to submit data to a Local-EGA programmatically.
Provides standardised tools to discover and access human data
- Local-EGAs for metadata sharing: By extending the use of Local-EGAs the Use Case is increasing the amount of human data that is discoverable. Local-EGAs store metadata from the main EGA, which will allow you to use the local EGA to search both the main and local EGA. You can also search and retrieve information from the Local-EGA by using the Local-EGI API, so you can build your own services based on the data available. In addition, the main EGA will gather the metadata from all the data submitted to Local-EGAs, so a search at the main EGA will allow you to find data located across all Local-EGAs.
- Beacon project: The Human Data Use Case is working with the Global Alliance for Genomics and Health (GA4GH) to use the beacon discovery service for resources across ELIXIR. The Beacon service provides a simple way to make data discoverable. You can query the lightweight metadata provided by a data resource (a 'beacon') to ask questions like 'Does this dataset have genomes with this allele at that position?' and get a 'Yes' or 'No' answer.
- Regulating access to sensitive data: the Use Case is working with the Compute Platform to use the ELIXIR Authentication and Authorization Infrastructure (AAI) for ELIXIR resources. The AAI is a system that allows you to have a single identity across a range of different services, so you can use the same log-in for each service. The ELIXIR AAI also contains a Resource Entitlement Management System (REMS). This provides a way that you can request access online to a restricted data resource, and a Data Access Committee (DAC) for that resource can review your application. If you are granted access, you can the log in to the resource using the AAI (which verifies your right to access the data).
Develops long-term management policies for human data
- The Use Case is documenting long-term data storage requirements and metadata mappings needed for submitting complex heterogeneous data into the EGA.
Ensures that human data in ELIXIR services is handled in accordance with the appropriate legal framework
- The Use Case ensures that ELIXIR services handling human data comply with the General Data Protection Regulation (GDPR).
Find out more
- Contact Pascal Kahlem (pkahlem[at]gmail[dot]com) to learn how you can get involved with ELIXIR's work with human data, and how this work can help you.
- The work in the Use Case is based on earlier Implementation Studies:
- Data Resource Implementations for the GA4GH Data Schema (2016-17)
- ELIXIR – IMI OncoTrack scoping study on long-term data handling (2016)
- Genomic data management for TraIT using the EGA (2015-16)
- 2015-2016 Beacon project
- Genomic data management for TraIT using the EGA: Case study in submission and access integration of controlled access data with tranSMART and Galaxy to serve large European cohort studies (2015)
- Interoperable controlled-access big data transfer for ELIXIR - expanding EGA collaboration (2014)
- ELIXIR and GA4GH Beacon Team Up to Advance Genomic Data Sharing (news story)
- Brochure on ELIXIR Human data activities
- ELIXIR-EXCELERATE WP9 group on the intranet (you must be logged in)