Given a set of data objects, consider that object pairs are assigned two weights expressing the advantage of putting those objects in the same cluster or in separate clusters, respectively. Correlation clustering partitions the input object set so as to minimize the sum of the intra-cluster negative-type weights plus the sum of the inter-cluster positive-type weights. Existing approximation algorithms provide quality guarantees if the weights are bounded in some way. Regardless of the type, the weight bounds that have been so far studied are local bounds, i.e., constraints that are required to hold for every object pair in isolation. In this paper, we discuss global weight bounds in correlation clustering, and in particular, we derive bounds on edge weights' aggregate functions that are sufficient to lead to proved quality guarantees. Our formulation extends the range of applicability of the most prominent existing correlationclustering algorithms thus providing benefits, both theoretical and practical. Also, we showcase our results in a real-world scenario of feature selection for fair clustering.
Correlation Clustering: From Local to Global Constraints
Gullo F.
2022-01-01
Abstract
Given a set of data objects, consider that object pairs are assigned two weights expressing the advantage of putting those objects in the same cluster or in separate clusters, respectively. Correlation clustering partitions the input object set so as to minimize the sum of the intra-cluster negative-type weights plus the sum of the inter-cluster positive-type weights. Existing approximation algorithms provide quality guarantees if the weights are bounded in some way. Regardless of the type, the weight bounds that have been so far studied are local bounds, i.e., constraints that are required to hold for every object pair in isolation. In this paper, we discuss global weight bounds in correlation clustering, and in particular, we derive bounds on edge weights' aggregate functions that are sufficient to lead to proved quality guarantees. Our formulation extends the range of applicability of the most prominent existing correlationclustering algorithms thus providing benefits, both theoretical and practical. Also, we showcase our results in a real-world scenario of feature selection for fair clustering.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.