Three ELIXIR Pilot Actions were launched in 2014, extending into the first part of 2015:
Webinars on the outcomes of these pilots will be given as part of the ongoing ELIXIR Webinar series.
Interoperable controlled-access big data transfer for ELIXIR - expanding EGA collaboration
Building upon the existing ELIXIR based collaboration within the European Genome-phenome archive (EGA) from the European Bioinformatics Institute (EMBL-EBI) in the UK and the Center for Genomic Regulation (CRG) in Spain, this pilot project addressed limitations on computing, network bandwidth, and storage buffer areas that affected the EGA data delivery, and developed a protocol for secure data transfer from the EGA archive to CRG.
The pilot tested a number of data transfer protocols, (FTP, UDT, Aspera and Globus), optimised the EBI hardware and network access to the archive in support of large scale re-encryption processes, and created necessary monitoring tools for validating data integrity at the CRG storage. In total, the project successfully transferred one petabyte of data using Aspera.
The new protocol now allows us to further optimise each step with the aim of automating the entire process by the end of 2015.
While the current data deluge creates a need for distributed data storage and replication, it is essential to enable data access through a single access interface.
This pilot action aims to integrate the raw data repositories for mass spectrometry (MS) proteomics data run by BILS (Sweden) and ProteomeXchange (via the PRIDE database, EMBL-EBI, UK), using the European infrastructure EUDAT, and will serve as an example to connect national data storage services and international repositories through ELIXIR. It will also show the potential of collaboration among research infrastructures and e-infrastructures to better manage the data deluge, and help to evaluate the requirements of such federated systems.
Marine metagenomics: towards user centric services
Marine genomics and metagenomics (the study of genetic material recovered directly from marine environmental samples) is still in its infancy, but is a rapidly expanding area of life science research.
To prevent the large-scale implementation of such studies from being disruptive (where the data production is faster than the speed users are able to analyze and interpret it) there is an urgent need to establish dedicated data management e-infrastructure and bioinformatics pipelines specialized for marine research.
Involving ELIXIR Norway, EMBL- EBI and other partners from the ELIXIR Nodes, this Pilot Action aims to harmonize existing pipelines and develop new, or improve established components, in the pipelines in order to establish long-term sustainable service platforms, and to start to build a "user community" for marine metagenomics analysis in ELIXIR.