Detecting and tracking topics and events from web search logs

Hongyan Liu, Jun He, Yingqin Gu, Hui Xiong, Xiaoyong Du

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

Recent years have witnessed increased efforts on detecting topics and events from Web search logs, since this kind of data not only capture web content but also reflect the users' activities. However, the majority of existing work is focused on exploiting clustering techniques for topic and event detection. Due to the huge size and the evolving nature of Web data, existing clustering approaches are limited to meet the realtime demand. To that end, in this article, we propose a method called LETD to detect evolving topics in a timely manner. Also, we design the techniques to extract events from topics and to infer the evolving relationship among the events. For topic detection, we first provide a measurement to select the important URLs, which are most likely to describe a real-life topic. Then, starting from these selected URLs, we exploit the local expansion method to find other topic-related URLs. Moreover, in the LETD framework, we design algorithms based on Random Walk and Markov Random Fields (MRF), respectively. Because the LETD method exploits a divide-and-conquer strategy to process the data, it is more efficient than existing methods based on clustering techniques. To better illustrate the LETD framework, we develop a demo system StoryTeller which can discover hot topics and events, infer the evolving relationships among events, and visualize information in a storytelling way. This demo system can provide a global view of the topic development and help users target the interesting events more conveniently. Finally, experimental results on real-world Microsoft click-through data have shown that StoryTeller can find real-life hot topics and meaningful evolving relationships among events, and has also demonstrated the efficiency and effectiveness of the LETD method.

Original languageEnglish (US)
Article number21
JournalACM Transactions on Information Systems
Volume30
Issue number4
DOIs
StatePublished - Nov 2012
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Business, Management and Accounting(all)
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Detecting and tracking topics and events from web search logs'. Together they form a unique fingerprint.

Cite this