One of the first step in RNA-Sequencing (RNA-Seq) data analysis consists of aligning (Next Generation Sequencing) reads to a reference genome. In literature, there are several tools implemented by practitioners and researchers for the alignment step. However, two tools are the de-facto-standard used by bioinformatics researchers in their pipelines: HISAT (version 2) and STAR (version 2). The aim of this study is to determine the impact of the alignment tool on the RNA-Seq analysis in terms of biological relevance of the results and computational time. The two implemented pipelines return different results on the biological side. This is due to assumptions the used tools made and to the specific characteristics of the underlying (statistical) models. The study provides valuable insights for researchers interested in optimizing their RNA-Seq pipelines and making informed decisions about which pipeline to use. As lesson learned, we suggest bioinformatics researchers to use more pipelines when make experiments to reduce the prediction errors induced by assumption of a specific tool or method.

Comparing HISAT and STAR-based pipelines for RNA-Seq Data Analysis: A real experience

Bianchi A.
Methodology
;
Di Marco A.
Supervision
;
Pellegrini C.
Writing – Review & Editing
2023-01-01

Abstract

One of the first step in RNA-Sequencing (RNA-Seq) data analysis consists of aligning (Next Generation Sequencing) reads to a reference genome. In literature, there are several tools implemented by practitioners and researchers for the alignment step. However, two tools are the de-facto-standard used by bioinformatics researchers in their pipelines: HISAT (version 2) and STAR (version 2). The aim of this study is to determine the impact of the alignment tool on the RNA-Seq analysis in terms of biological relevance of the results and computational time. The two implemented pipelines return different results on the biological side. This is due to assumptions the used tools made and to the specific characteristics of the underlying (statistical) models. The study provides valuable insights for researchers interested in optimizing their RNA-Seq pipelines and making informed decisions about which pipeline to use. As lesson learned, we suggest bioinformatics researchers to use more pipelines when make experiments to reduce the prediction errors induced by assumption of a specific tool or method.
2023
979-8-3503-1224-9
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/223526
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact