Comparison of environmental sequences to reference sets from curated marker loci provides a mainstay for taxonomic analysis of microbial communities. Microbial eukaryotic sequencing requires many distinct reference sets to cover diversity adequately. Those producing reference sets follow different curation workflows, but share the need to provide their data onwards to a common set of tools and services, such as EMG, Megan, MetaPIPE and BioMaS.
There are multiple inefficiencies:
- reference set providers must build services to sustain and feed their data to consumer tools and services
- consumers must import reference sets from several sources with different formats.
Led by the ITSoneDB team, who provide the leading fungi and other eukaryotes ITS1 reference set, we will develop a new data type within ENA that will capture systematically these reference sets and serve them to dependent resources, eliminating inefficiencies, leveraging this core ELIXIR resource and building sustainability into reference set generation workflows.
Currently, taxonomic analysis of microbial communities relies on multiple dispersed reference data sets. The impact of this study will be that ENA will be enriched with a new structured data type to accommodate these taxonomic reference datasets, beginning with ITS1 from rRNA, from the ITSoneDB team. By enhancing the connectivity and coordination between the various reference datasets and ENA a stable system to systematically capture their data and serve them to the consumer services from one place will be made available. This will increase both the sustainability and exposure of the data and facilitate/promote their use and re-use.