The real-time nature and massive volume of social-media data has converted news portals and micro-blogging platforms into social sensors, causing a flourishing of research on story or event detection in online user-generated content and social-media text streams. Existing approaches to story identification broadly fall into two categories. Approaches in the first category extract stories as cohesive substructures in a graph representing the strength of association between terms. The latter category includes approaches that analyze the temporal evolution of individual terms and identify stories by grouping terms with similar anomalous temporal behavior. Both categories have their own limitations. Approaches in the first category are unable to distinguish ever-popular concepts from stories that buzz in a time interval of interest, i.e., attract an amount of attention that deviates significantly from the typical level observed. The second category ignores term co-associations and the wealth of information captured by them. In this work we advance the literature on story identification by profitably combining the peculiarities of the two main state-of-the-art approaches. We propose a novel method that characterizes abnormal association between terms in a certain time window and leverages the graph structure induced by such anomalous associations so as to identify stories as subsets of terms that are cohesively associated in this graph. Experiments performed on two datasets extracted from a real-world web-search query log and a news corpus, respectively, attest the superiority of the proposed method over the two main existing story-identification approaches.
The importance of unexpectedness: Discovering buzzing stories in anomalous temporal graphs
Stilo G.
2019-01-01
Abstract
The real-time nature and massive volume of social-media data has converted news portals and micro-blogging platforms into social sensors, causing a flourishing of research on story or event detection in online user-generated content and social-media text streams. Existing approaches to story identification broadly fall into two categories. Approaches in the first category extract stories as cohesive substructures in a graph representing the strength of association between terms. The latter category includes approaches that analyze the temporal evolution of individual terms and identify stories by grouping terms with similar anomalous temporal behavior. Both categories have their own limitations. Approaches in the first category are unable to distinguish ever-popular concepts from stories that buzz in a time interval of interest, i.e., attract an amount of attention that deviates significantly from the typical level observed. The second category ignores term co-associations and the wealth of information captured by them. In this work we advance the literature on story identification by profitably combining the peculiarities of the two main state-of-the-art approaches. We propose a novel method that characterizes abnormal association between terms in a certain time window and leverages the graph structure induced by such anomalous associations so as to identify stories as subsets of terms that are cohesively associated in this graph. Experiments performed on two datasets extracted from a real-world web-search query log and a news corpus, respectively, attest the superiority of the proposed method over the two main existing story-identification approaches.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.