Background: In the context of software development, GitHub has been at the forefront of platforms to store, analyze and maintain a large number of software repositories. Topics have been introduced by GitHub as an effective method to annotate stored repositories. However, labeling GitHub repositories should be carefully conducted to avoid adverse effects on project popularity and reachability. Aims: We present TopFilter, a novel approach to assist open source software developers in selecting suitable topics for GitHub repositories being created. Method: We built a project-topic matrix and applied a syntactic-based similarity function to recommend missing topics by representing repositories and related topics in a graph. The ten-fold cross-validation methodology has been used to assess the performance of TopFilter by considering different metrics, i.e., success rate, precision, recall, and catalog coverage. Result: The results show that TopFilter recommends good topics depending on different factors, i.e., collaborative filtering settings, considered datasets, and pre-processing activities. Moreover, TopFilter can be combined with a state-of-the-art topic recommender system (i.e., MNB network) to improve the overall prediction performance. Conclusion: Our results confirm that collaborative filtering techniques can successfully be used to provide relevant topics for GitHub repositories. Moreover, TopFilter can gain a significant boost in prediction performances by employing the outcomes obtained by the MNB network as its initial set of topics.

TopFilter: An approach to recommend relevant GitHub topics

Di Rocco Juri;Di Ruscio Davide;Claudio Di Sipio;Phuong Nguyen;Riccardo Rubei
2020-01-01

Abstract

Background: In the context of software development, GitHub has been at the forefront of platforms to store, analyze and maintain a large number of software repositories. Topics have been introduced by GitHub as an effective method to annotate stored repositories. However, labeling GitHub repositories should be carefully conducted to avoid adverse effects on project popularity and reachability. Aims: We present TopFilter, a novel approach to assist open source software developers in selecting suitable topics for GitHub repositories being created. Method: We built a project-topic matrix and applied a syntactic-based similarity function to recommend missing topics by representing repositories and related topics in a graph. The ten-fold cross-validation methodology has been used to assess the performance of TopFilter by considering different metrics, i.e., success rate, precision, recall, and catalog coverage. Result: The results show that TopFilter recommends good topics depending on different factors, i.e., collaborative filtering settings, considered datasets, and pre-processing activities. Moreover, TopFilter can be combined with a state-of-the-art topic recommender system (i.e., MNB network) to improve the overall prediction performance. Conclusion: Our results confirm that collaborative filtering techniques can successfully be used to provide relevant topics for GitHub repositories. Moreover, TopFilter can gain a significant boost in prediction performances by employing the outcomes obtained by the MNB network as its initial set of topics.
2020
9781450375801
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/179335
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? ND
social impact