Imbalanced datasets, which are very common in many application fields, represent a formidable problem for most of the machine learning algorithms. On the other hand, such algorithms are being extensively applied in many areas, showing promising results and outperforming other approaches. Therefore, many techniques have been developed to re-balance datasets in order to improve machine learning applicability. In this paper we present two algorithms, called G1No (Generative resampling 1-nearest Neighbour) and G1No Gourmet, which compensate dataset imbalance by generating synthetic samples, and compare them with two state-of-the-art re-balancing algorithms, namely the Synthetic Minority Oversampling (SMOTE) and the ADAptive SYNthetic sampling (ADASYN). The experiments, carried on a realistic malware traffic dataset, namely, the MTA-KDD'19, show that G1No outperforms the other algorithms and is even able to improve the quality of the original dataset.
Imbalanced Dataset Optimization with New Resampling Techniques
Ivan Letteri;Abeer Dyoub;Giuseppe Della Penna
2022-01-01
Abstract
Imbalanced datasets, which are very common in many application fields, represent a formidable problem for most of the machine learning algorithms. On the other hand, such algorithms are being extensively applied in many areas, showing promising results and outperforming other approaches. Therefore, many techniques have been developed to re-balance datasets in order to improve machine learning applicability. In this paper we present two algorithms, called G1No (Generative resampling 1-nearest Neighbour) and G1No Gourmet, which compensate dataset imbalance by generating synthetic samples, and compare them with two state-of-the-art re-balancing algorithms, namely the Synthetic Minority Oversampling (SMOTE) and the ADAptive SYNthetic sampling (ADASYN). The experiments, carried on a realistic malware traffic dataset, namely, the MTA-KDD'19, show that G1No outperforms the other algorithms and is even able to improve the quality of the original dataset.File | Dimensione | Formato | |
---|---|---|---|
CameraReadyV3-IntelliSys2021_Imbalanced_Dataset_Optimization_with_NewResampling_Techniques.pdf
solo utenti autorizzati
Tipologia:
Documento in Pre-print
Licenza:
Creative commons
Dimensione
5.01 MB
Formato
Adobe PDF
|
5.01 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.