We were at BITS 2022
Processing MALDI/ToF spectra
for differential peptidomics in a clinical asset
Iervasi E, Coronel Vargas G, Profumo A, Romano P.
Low molecular weight proteins or peptides, are identified in biological fluids such as serum, plasma or urine. The peptidome is a mirror of tissue function and can be related to pathophysiological events. MALDI / ToF MS enables rapid acquisition of spectra from complex matrix samples and is the best choice for peptidome profiling and differential analysis between two or more groups of samples. However, a reliable differential analysis requires pre-processing of the data, such as merging isotope abundances for each molecule, averaging the data from technical replicas, removing noise, aligning spectra from different samples and possibly getting a statistical evaluation. An important unsolved problem in differential peptidomics is the integrity of long-term cryopreserved serum samples. Open software offers a wide range of solutions that are too complex for an inexperienced programming researcher to handle. In addition, some essential processing steps are missing. For these reasons,we have developed some original softwares according to our needs.
Open software was used for the development and production environments. The LAMP environment, which includes Linux, Apache, MySQL and PHP and is especially fit for frequent software updates, was adopted as reference. In order to reuse software components or libraries of interest, our tools incorporate on the server side some piece of software written in perl and R. The tools Seradeg and Geena 2 are completely written in PHP. Geena 2 uses the neapolis perl script for aligning spectra. R and its libraries MALDIquant, MALDIquantForeign, MALDIrppa, cluster, scales, ggplot2, ggrepel, dendextend, mixOmics, lsa, Pheatmap, and kableExtra are used by GeenaR. Software is running on a virtual machine in the cloud of the GARR e-infrastructure. The VM is based on OpenStack Nova, version 21.2.3 and includes 4 2GHz CPUs Intel Xeon E3-12xx v2 (Ivy Bridge, IBRS), 8 Gb RAM, 1Tb disk space. The operating system presently is Ubuntu 10.04.1 LTS. All tools are publicly available through the web site of the lab.
A comprehensive workflow for MALDI/ToF spectra pre-processing and elaboration is presented in Fig. 1. In this workflow, the available tools (Seradeg, Geena2, and GeenaR) are shown along with open software that we are currently using (MSconvert and SAMR). Some scripts remain unconnected and must be launched manually at present. We are working on a common interface that integrates all tools and scripts so that even unskilled researchers may run the workflow. It is noteworthy that Seradeg may only be used when serum samples are used (5, 6). Geena 2 (9) is a web tool able to:
- sum up isotopic replicas of the same molecule,
- align and average technical replicas to produce a single representative spectrum per sample,
- align representative spectra and produce a table of signals and corresponding abundances (7).
SeraDeg (8) is a web tool aimed at assessing the quality of sera through a comparative analysis of the contents of fibrinopeptide A (fpA) fragments, a molecule which is known to be easily degraded under physico-chemical stress conditions. Quality scores provided by SeraDeg allow users to select the samples of an adequate quality (1, 4). GeenaR (10) is a new implementation of Geena 2 based on R modules and the following R packages are about to be incorporated for the analysis on the server side: MALDIquant and MALDIquantForeign for mass spectra pre-processing and analysis, OrgMassSpecR for mass spectra comparison, dendextend and pvclust for clustering, sda and crossval for variable selection (3).
- Beitia M, Romano P, Larrinaga G, Solano-Iturri JD, Salis A, Damonte G, Bruzzone M, Ceppi M and Profumo A. The Activation of Prothrombin Seems to Play an Earlier Role Than the Complement System in the Progression of Colorectal Cancer: A Mass Spectrometry Evaluation. Diagnostics 2020 Dec 11, 10(12), 1077. Doi: 10.3390/diagnostics10121077
- Boccardo F, Rubagotti A, Nuzzo PV, Argellati F, Savarino G, Romano P, Damonte G, Rocco M, Profumo A. Matrix-assisted laser desorption/ionisation (MALDI) TOF analysis identifies serum Angiotensin II concentrations as a strong predictor of all-cause and breast cancer (Bca)-specific mortality following breast surgery. International Journal of Cancer. 2015 137(10):2394-2402. Doi: 10.1002/ijc.29609
- Del Prete E, Facchiano A, Profumo A, Angelini C, Romano P. GeenaR: a web tool for reproducible MALDI-TOF analysis. Front. Genet. 29 March 2021. 12:635814. doi: 10.3389/fgene.2021.635814
- Mangerini R, Romano P, Facchiano A, Damonte G, Muselli M, Rocco M, Boccardo F, Profumo A. The application of atmospheric pressure MALDI to the analysis of long-term cryopreserved serum peptidome. Analytical Biochemistry 417 (2011) 174–181. Doi: 10.1016/j.ab.2011.06.021
- Profumo A, Mangerini R, Rubagotti A, Romano P, Damonte G, Guglielmini P, Facchiano A, Ferri F, Ricci F, Rocco M, Boccardo F. Complement C3f serum levels may predict breast cancer risk in women with gross cystic disease of the breast. Journal of Proteomics 2013, 85:44–52. Doi: 10.1016/j.jprot.2013.04.029
- Romano P, Beitia San Vicente M, Profumo A. A mass spectrometry based method and a software tool to assess degradation status of serum samples to be used in proteomics for biomarker discovery. Journal of Proteomics 2018, 173: 99–106. Doi: 10.1016/j.jprot.2017.12.004
- Romano P, Profumo A, Rocco M, Mangerini R, Ferri F, Facchiano A. Geena 2, improved automated analysis of MALDI/TOF mass spectra. BMC Bioinformatics 2016, 17(Suppl 4):61. Doi: 10.1186/s12859-016-0911-2
- SeraDeg: Sera Degradation – http://proteomics.hsanmartino.it/seradeg/
- Geena2: Spectra Analysis – http://proteomics.hsanmartino.it/geena2/
- GeenaR: R based Spectra Analysis – http://proteomics.hsanmartino.it/geenar/
Recent achievements of the EOSC-Life project
Romano P, Rosato A, EOSC-Life project Consortium
The EOSC-Life project has been funded by the EU Horizon 2020 programme in the context of initiatives targeted to the empowerment of ESFRI Research Infrastructures (RIs) (2019-2023, grant 824087). It brings together the 13 Life Science RIs to create an open, digital and collaborative space for biological and medical research. The practical aim is enabling interoperability of the RI data and workflows, in line with the FAIR principles, by leveraging technologies such as APIs for ease of access to data, ontologies for sharing data semantics, and workflow management systems for orchestrating access to data and analysis tools and composing complex analysis and curation pipelines.
In this abstract, we present recent achievements of the EOSC-Life project, with emphasis on the activities and some outcomes of its most technological workpackages.
The EOSC-Life project is structured in 13 workpackages, two of which were added in 2020 in order to cope with the pandemic (WP13, WP14) and are working on the implementation of the Covid-19 data portal and a registry for clinical trial data. The most technical workpackages relate to sensitive and non-sensitive data management (WP4, WP1), software tools, workflows and related registries (WP2), access to data and resources (WP5) and cloud storage and compute (WP7).
FAIR compliant data resources are the starting point to accomplish the project aims. Most RIs have several FAIR resources serving the needs of their user communities, which are now registered in the FAIRsharing.org archive as a collection that extends to 123 resources and 11 standards. This EOSC-Life collection is accessible within the European Open Science Cloud (EOSC) as well. In addition to FAIRification of Life Science data, the EOSC-Life project has fostered the adoption of technologies for the cloud deployment of workflows for data analysis and integration. The project covers the full software stack required to implement workflows, from operating system to containerization and web front-ends, by addressing three major points: software and tools packaging, workflow composition and execution, and registries. In this context the project developed two new tools: the WorkflowHub (1), a registry for describing, sharing and publishing scientific computational workflows, and the LifeMonitor (2), a service to support the sustainability and reusability of published computational workflows. In addition, the project contributed to the development of the Life Science Authentication and Authorization Infrastructure (LS-AAI, LS Login), an access and user management system to enable multi-RI applications and workflows (3). The LS-Login built on existing approaches and supports access to sensitive data with their specific requirements. ELIXIR services have now moved from ELIXIR AAI to LS Login.
Design and implementation of the MBDS-DB federated database
for the Sardinian Microbial Culture Collections
Romano P, Budroni M, Cosentino S, Daga E, Deplano M, Multineddu C, Comunian R.
Many strategies, laws and regulations have been enacted in the last decades by governments for biodiversity preservation. The MicroBioDiverSar (MBDS) project, funded by the Italian Minister of Agricultural, Food and Forestry Policies (Mipaaf), was aimed at surveying, cataloguing, and managing the microorganisms and the related information of three Sardinian collections (1).
Microbial resources were reordered and inventoried, and a federated database was designed according to both international standards and laboratory needs. The resulting MBDS collection includes over 21,000 isolates, belonging to over 200 species of bacteria, yeasts, and filamentous fungi isolated from different matrices, mainly food, of animal and vegetable origin, that were included in the associated database. Currently, about 2000 isolates, belonging to 150 species, are available online for both the scientific community and agri-food producers.
In this abstract, we present the architectural choices, the implementation methods and the results achieved in the development and creation of the MBDS database (MBDS-DB).
One single database schema and software were designed to be adopted by all collections, and three distinct databases sharing some common reference tables were implemented, thus allowing collections to manage their own data autonomously. A centralized and integrated copy of the databases provides a common query interface for all catalogues. The overall architecture of the system, then, includes three distinct databases implemented and located by the collections, and one integrated database that maintains the reference tables coherent. This simple architecture allows the easy and efficient inclusion of further databases. The key issue for this common, but distributed, database architecture was synchronization. Reference tables must be synchronized so that all databases share their values at any time. Moreover, update carried out by each collection on its database must be synchronized with the central repository to allow end users to query a constantly up-to-date integrated database.
Open software was used for both development and production environments. The LAMP environment, including Linux, Apache, MySQL, and PHP was adopted. This environment is especially fit for the updates of the software. The data set and the format of singles data fields were defined on the basis of the specifications for microorganisms data of MIRRI-IS, the information system of the Microbial Resource Research Infrastructure (MIRRI), and by comparing it with the needs of the partner collections. The final data set fits the needs for strain’s data submission to the MIRRI-IS.
Access to the integrated database is granted to interested researchers through the website of the MicroBioDiverSar project (http://mbds.it/). A simple form is available for querying the catalogues by resource type, species, strain number and name, and substrate. An advanced query form is under development. The output list of strains can be printed or exported. By selecting one of the strains, the system returns its associated information, grouped in distinct sections for strain identification, origin, properties, literature, and notes. Only a subset of information is included in the output. Access to database management scripts requires user authentication. Account types with different access rights have been defined: users, curators, administrators. Here below, the main functions are described.
- Query: Allows users to query the local database.
- Insert: Allows admins and curators to insert new strains in the local database. Only a small data subset is included in the database in synchronization with the centralized database, the resource type and strain name being the only mandatory data. The species can be specified later, which is useful when the identification has not yet been done.
- Update: Allows admins and curators to complete the insertion of data on a strain or update it locally. Information is organized in distinct sections for sake of clarity and simplicity of use.
- Reference lists: Allows admins and curators to see and manage reference lists in synchronization with the centralized database. Lists are available for many information, including species, substrates, geographic locations, culture media, literature and researchers.
- Tools: Available to admins only. It allows to create and restore database backups and to update the centralized database by uploading data from the local database.
- Options: Allows admins to create and delete users and to define and change their passwords.
The MBDS database has been proposed as a model for other Italian collections that, as the MBDS partners, are part of the Joint Research Unit MIRRI-IT Italian collections network (2), with the aim of overcoming fragmentation, facing sustainability challenges, and improving the quality of the management of the collections.
- Daga E, Budroni M, Multineddu C, Cosentino S, Deplano M, Romano P, Comunian R. The MicroBioDiverSar project: exploring the microbial biodiversity in ex situ collections of Sardinia. Sustainability 2021, 13(15):8494. DOI: 10.3390/su13158494
- De Vero L, Boniotti MB, Budroni M, Buzzini P, Cassanelli S, Comunian R, Gullo M, Logrieco AF, Mannazzu I, Musumeci R, Perugini I, Perrone G, Pulvirenti A, Romano P, Turchetti B and Varese GC. Preservation, characterization and exploitation of microbial biodiversity: the perspective of the Italian network of culture collections. Microorganisms. 2019, 7:685. DOI: 10.3390/microorganisms7120685