With the development of technology and the relatively higher availability of new instrumentations, having multiblock data sets (eg, a set of samples analyzed by different analytical techniques) is becoming more and more common and, as a consequence, how to handle this kind of outcomes is a widely discussed topic. In such a context, where the number of involved variables is relatively high, selecting the most significant features is obviously relevant. For this reason, the possibility of joining a multiblock regression method, the sequential and orthogonalized partial least‐squares (SO‐PLS), with a variable selection approach called covariance selection (CovSel), has been investigated. The resulting method, sequential and orthogonalized covariance selection (SOCovSel) is similar to SO‐PLS, but the feature reduction provided by PLS is performed by CovSel. Finally, predictions are made by applying multiple linear regression on the subset of selected variables. The novel approach has been tested on different multiblock data sets both in regression and in classification (by combination with LDA), and it has been compared with another state‐ofthe‐ art multiblock method. SO‐CovSel has demonstrated to be suitable for its purpose: It has provided good predictions (both in regression and in classification) and, from the interpretation point of view, it has led to a meaningful selection of the original variables.

SO-CovSel: A novel method for variable selection in a multiblock framework

Alessandra Biancolillo
;
2020-01-01

Abstract

With the development of technology and the relatively higher availability of new instrumentations, having multiblock data sets (eg, a set of samples analyzed by different analytical techniques) is becoming more and more common and, as a consequence, how to handle this kind of outcomes is a widely discussed topic. In such a context, where the number of involved variables is relatively high, selecting the most significant features is obviously relevant. For this reason, the possibility of joining a multiblock regression method, the sequential and orthogonalized partial least‐squares (SO‐PLS), with a variable selection approach called covariance selection (CovSel), has been investigated. The resulting method, sequential and orthogonalized covariance selection (SOCovSel) is similar to SO‐PLS, but the feature reduction provided by PLS is performed by CovSel. Finally, predictions are made by applying multiple linear regression on the subset of selected variables. The novel approach has been tested on different multiblock data sets both in regression and in classification (by combination with LDA), and it has been compared with another state‐ofthe‐ art multiblock method. SO‐CovSel has demonstrated to be suitable for its purpose: It has provided good predictions (both in regression and in classification) and, from the interpretation point of view, it has led to a meaningful selection of the original variables.
File in questo prodotto:
File Dimensione Formato  
cem.3120.pdf

non disponibili

Tipologia: Documento in Versione Editoriale
Licenza: Creative commons
Dimensione 1.76 MB
Formato Adobe PDF
1.76 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/139243
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 47
  • ???jsp.display-item.citation.isi??? 46
social impact