The study describes a protocol for methylation analysis integrated with Machine Learning (ML) algorithms developed to classify Facio-Scapulo-Humeral Dystrophy (FSHD) subjects. The DNA methylation levels of two D4Z4 regions (DR1 and DUX4-PAS) were assessed by an in-house protocol based on bisulfite sequencing and capillary electrophoresis, followed by statistical and ML analyses. The study involved two independent cohorts, namely a training group of 133 patients with clinical signs of FSHD and 150 healthy controls (CTRL) and a testing set of 27 FSHD patients and 25 CTRL. As expected, FSHD patients showed significantly reduced methylation levels compared to CTRL. We utilized single CpG sites to develop a ML pipeline able to discriminate FSHD subjects. The model identified four CpGs sites as the most relevant for the discrimination of FSHD subjects and showed high metrics values (accuracy: 0.94, sensitivity: 0.93, specificity: 0.96). Two additional models were developed to differentiate patients with lower D4Z4 size and patients who might carry pathogenic variants in FSHD genes, respectively. Overall, the present model enables an accurate classification of FSHD patients, providing additional evidence for DNA methylation as a powerful disease biomarker that could be employed for prioritizing subjects to be tested for FSHD.

D4Z4 Methylation Levels Combined with a Machine Learning Pipeline Highlight Single CpG Sites as Discriminating Biomarkers for FSHD Patients

Caputo, Valerio;Cascella, Raffaella;
2022-01-01

Abstract

The study describes a protocol for methylation analysis integrated with Machine Learning (ML) algorithms developed to classify Facio-Scapulo-Humeral Dystrophy (FSHD) subjects. The DNA methylation levels of two D4Z4 regions (DR1 and DUX4-PAS) were assessed by an in-house protocol based on bisulfite sequencing and capillary electrophoresis, followed by statistical and ML analyses. The study involved two independent cohorts, namely a training group of 133 patients with clinical signs of FSHD and 150 healthy controls (CTRL) and a testing set of 27 FSHD patients and 25 CTRL. As expected, FSHD patients showed significantly reduced methylation levels compared to CTRL. We utilized single CpG sites to develop a ML pipeline able to discriminate FSHD subjects. The model identified four CpGs sites as the most relevant for the discrimination of FSHD subjects and showed high metrics values (accuracy: 0.94, sensitivity: 0.93, specificity: 0.96). Two additional models were developed to differentiate patients with lower D4Z4 size and patients who might carry pathogenic variants in FSHD genes, respectively. Overall, the present model enables an accurate classification of FSHD patients, providing additional evidence for DNA methylation as a powerful disease biomarker that could be employed for prioritizing subjects to be tested for FSHD.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/248629
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 6
social impact