In this paper we analyze the distribution of interests in a large population of Twitter users (the full set of 40 million users in 2009 and a sample of about 100 thousand New York users in 2014), as a function of gender. To model interests, we associate "topical" friends in users' friendship lists (friends representing an interest rather than a social relation between peers) with Wikipedia categories. A word-sense disambiguation algorithm is used for selecting the appropriate wikipage for each topical friend. Starting from the set of wikipages representing the population's interests, we extract the sub-graph of Wikipedia categories connected to these pages, and we then prune cycles to induce a direct acyclic graph, that we call Twixonomy. We use a novel method for reducing the computational requirements of cycle detection on very large graphs. For any category at any generalization level in the Twixonomy, it is then possible to estimate the gender distribution of Twitter users interested in that category. We analyze both the population of "celebrities", i.e. male and female Twitter users with an associated wikipage, and the population of "peers", i.e. male and female users who follow celebrities.
File in questo prodotto:
Non ci sono file associati a questo prodotto.