The graph edit distance (GED) is among the most widely used graph similarity measures in practice. It asks for a minimum cost edit path between two given labeled graphs G and H, where the edit path is defined as a sequence of operations (e.g., node and edge insertions, deletions or substitutions) that successively transform the graph G into H. In this work, we suggest a new ILP formulation (FORI) based on orienting the corresponding edge variables. Moreover, we suggest enhancing two state-of-the-art ILP formulations by incorporating additional inequalities. We theoretically compare the strength of the formulations with respect to their Linear Programming relaxations. The result is a hierarchy with (FORI) at the top. Our extensive evaluation on widely used benchmark sets shows that our improved formulations run significantly faster than the previous ones. These allow to solve to proven optimality all the reference instances from common databases, such as the IAM Graph Database, many of which were prohibitive with state-of-the-art methods. Moreover, we are able to compute the GED of a small pattern and a large graph such as CORA and PUBMED, having up to 19,717 nodes and 44,327 edges.

Enhancing Graph Edit Distance Computation: Stronger and Orientation-Based ILP Formulations

D'Ascenzo, Andrea;Rossi, Fabrizio
2025-01-01

Abstract

The graph edit distance (GED) is among the most widely used graph similarity measures in practice. It asks for a minimum cost edit path between two given labeled graphs G and H, where the edit path is defined as a sequence of operations (e.g., node and edge insertions, deletions or substitutions) that successively transform the graph G into H. In this work, we suggest a new ILP formulation (FORI) based on orienting the corresponding edge variables. Moreover, we suggest enhancing two state-of-the-art ILP formulations by incorporating additional inequalities. We theoretically compare the strength of the formulations with respect to their Linear Programming relaxations. The result is a hierarchy with (FORI) at the top. Our extensive evaluation on widely used benchmark sets shows that our improved formulations run significantly faster than the previous ones. These allow to solve to proven optimality all the reference instances from common databases, such as the IAM Graph Database, many of which were prohibitive with state-of-the-art methods. Moreover, we are able to compute the GED of a small pattern and a large graph such as CORA and PUBMED, having up to 19,717 nodes and 44,327 edges.
File in questo prodotto:
File Dimensione Formato  
p4737-d_ascenzo.pdf

solo utenti autorizzati

Tipologia: Documento in Versione Editoriale
Licenza: Creative commons
Dimensione 1.04 MB
Formato Adobe PDF
1.04 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/270559
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact