Imbalanced Dataset Optimization with New Resampling Techniques

IRIS

Imbalanced datasets, which are very common in many application fields, represent a formidable problem for most of the machine learning algorithms. On the other hand, such algorithms are being extensively applied in many areas, showing promising results and outperforming other approaches. Therefore, many techniques have been developed to re-balance datasets in order to improve machine learning applicability. In this paper we present two algorithms, called G1No (Generative resampling 1-nearest Neighbour) and G1No Gourmet, which compensate dataset imbalance by generating synthetic samples, and compare them with two state-of-the-art re-balancing algorithms, namely the Synthetic Minority Oversampling (SMOTE) and the ADAptive SYNthetic sampling (ADASYN). The experiments, carried on a realistic malware traffic dataset, namely, the MTA-KDD'19, show that G1No outperforms the other algorithms and is even able to improve the quality of the original dataset.

Imbalanced Dataset Optimization with New Resampling Techniques

Ivan Letteri;Antonio Di Cecco;Abeer Dyoub;Giuseppe Della Penna

2022-01-01

Abstract

Imbalanced datasets, which are very common in many application fields, represent a formidable problem for most of the machine learning algorithms. On the other hand, such algorithms are being extensively applied in many areas, showing promising results and outperforming other approaches. Therefore, many techniques have been developed to re-balance datasets in order to improve machine learning applicability. In this paper we present two algorithms, called G1No (Generative resampling 1-nearest Neighbour) and G1No Gourmet, which compensate dataset imbalance by generating synthetic samples, and compare them with two state-of-the-art re-balancing algorithms, namely the Synthetic Minority Oversampling (SMOTE) and the ADAptive SYNthetic sampling (ADASYN). The experiments, carried on a realistic malware traffic dataset, namely, the MTA-KDD'19, show that G1No outperforms the other algorithms and is even able to improve the quality of the original dataset.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Codice ISBN
	
				978-3-030-82196-8
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
CameraReadyV3-IntelliSys2021_Imbalanced_Dataset_Optimization_with_NewResampling_Techniques.pdf solo utenti autorizzati Tipologia: Documento in Pre-print Licenza: Creative commons Dimensione 5.01 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	5.01 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/170081

Citazioni

ND

6

ND

social impact