In data analysis, how to select meaningful variables is a hot and wide-debated topic, and several variable selection (or feature reduction) approaches have been proposed in the literature. Although feature selection methods are numerous, most of them are suitable for data matrices, but not for higher order structures. This is mainly due to the fact the assessment of the relevancy of variables in a multi-way context has not been extensively discussed. To the best of our knowledge, among variable selection approaches developed for standard 2-way data arrays, only VIP analysis and selectivity ratio have been extended to higher-order structures. This aspect is not given by an irrelevance of the topic; on the contrary, the possibility of selecting information in a complex data set such as a multi-way structure is crucial. In the light of these considerations, the present paper discusses a feature selection strategy for N-way data based on the Covariance Selection (CovSel) approach, thus called N-CovSel. This method allows the selection of features of different dimensionality (from 1- up to (N-1)-way), depending on the nature of the original data array. The novel method has been applied on a simulated data set, in order to inspect its ability in selecting features compatible with the ground truth of the system, and on a real data set. In both cases, N-CovSel has demonstrated to be able to select meaningful features. Eventually, different strategies for the further analysis of the selected features have been proposed; some, based on sequential multi-block methods, providing a further data reduction, and some, N-PLS-based, respecting the multi-way nature of the data.

N-CovSel, a new strategy for feature selection in N-way data

Biancolillo, Alessandra;
2022

Abstract

In data analysis, how to select meaningful variables is a hot and wide-debated topic, and several variable selection (or feature reduction) approaches have been proposed in the literature. Although feature selection methods are numerous, most of them are suitable for data matrices, but not for higher order structures. This is mainly due to the fact the assessment of the relevancy of variables in a multi-way context has not been extensively discussed. To the best of our knowledge, among variable selection approaches developed for standard 2-way data arrays, only VIP analysis and selectivity ratio have been extended to higher-order structures. This aspect is not given by an irrelevance of the topic; on the contrary, the possibility of selecting information in a complex data set such as a multi-way structure is crucial. In the light of these considerations, the present paper discusses a feature selection strategy for N-way data based on the Covariance Selection (CovSel) approach, thus called N-CovSel. This method allows the selection of features of different dimensionality (from 1- up to (N-1)-way), depending on the nature of the original data array. The novel method has been applied on a simulated data set, in order to inspect its ability in selecting features compatible with the ground truth of the system, and on a real data set. In both cases, N-CovSel has demonstrated to be able to select meaningful features. Eventually, different strategies for the further analysis of the selected features have been proposed; some, based on sequential multi-block methods, providing a further data reduction, and some, N-PLS-based, respecting the multi-way nature of the data.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/193040
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact