Imbalanced datasets, which are very common in many application fields, represent a formidable problem for most of the machine learning algorithms. On the other hand, such algorithms are being extensively applied in many areas, showing promising results and outperforming other approaches. Therefore, many techniques have been developed to re-balance datasets in order to improve machine learning applicability. In this paper we present two algorithms, called G1No (Generative resampling 1-nearest Neighbour) and G1No Gourmet, which compensate dataset imbalance by generating synthetic samples, and compare them with two state-of-the-art re-balancing algorithms, namely the Synthetic Minority Oversampling (SMOTE) and the ADAptive SYNthetic sampling (ADASYN). The experiments, carried on a realistic malware traffic dataset, namely, the MTA-KDD'19, show that G1No outperforms the other algorithms and is even able to improve the quality of the original dataset.

Imbalanced Dataset Optimization with New Resampling Techniques

Ivan Letteri;Abeer Dyoub;Giuseppe Della Penna
2022

Abstract

Imbalanced datasets, which are very common in many application fields, represent a formidable problem for most of the machine learning algorithms. On the other hand, such algorithms are being extensively applied in many areas, showing promising results and outperforming other approaches. Therefore, many techniques have been developed to re-balance datasets in order to improve machine learning applicability. In this paper we present two algorithms, called G1No (Generative resampling 1-nearest Neighbour) and G1No Gourmet, which compensate dataset imbalance by generating synthetic samples, and compare them with two state-of-the-art re-balancing algorithms, namely the Synthetic Minority Oversampling (SMOTE) and the ADAptive SYNthetic sampling (ADASYN). The experiments, carried on a realistic malware traffic dataset, namely, the MTA-KDD'19, show that G1No outperforms the other algorithms and is even able to improve the quality of the original dataset.
978-3-030-82196-8
File in questo prodotto:
File Dimensione Formato  
CameraReadyV3-IntelliSys2021_Imbalanced_Dataset_Optimization_with_NewResampling_Techniques.pdf

non disponibili

Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 5.01 MB
Formato Adobe PDF
5.01 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11697/170081
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact