Using web structure for classifying and describing web pages

Eric J. Glover, Kostas Tsioutsiouliklis, Steve Lawrence, David M. Pennock, Gary W. Flake

Research output: Chapter in Book/Report/Conference proceedingConference contribution

162 Scopus citations

Abstract

The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents (documents that link to the target document) for search. We analyze the relative utility of document text, and the text in citing documents near the citation, for classification and description. Results show that the text in citing documents, when available, often has greater discriminative and descriptive power than the text in the target document itself. The combination of evidence from a document and citing documents can improve on either information source alone. Moreover, by ranking words and phrases in the citing documents according to expected entropy loss, we are able to accurately name clusters of web pages, even with very few positive examples. Our results confirm, quantify, and extend previous research using web structure in these areas, introducing new methods for classification and description of pages.

Original languageEnglish (US)
Title of host publicationProceedings of the 11th International Conference on World Wide Web, WWW '02
Pages562-569
Number of pages8
DOIs
StatePublished - 2002
Externally publishedYes
Event11th International Conference on World Wide Web, WWW '02 - Honolulu, HI, United States
Duration: May 7 2002May 11 2002

Publication series

NameProceedings of the 11th International Conference on World Wide Web, WWW '02

Other

Other11th International Conference on World Wide Web, WWW '02
Country/TerritoryUnited States
CityHonolulu, HI
Period5/7/025/11/02

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications

Keywords

  • Anchortext
  • Classification
  • Cluster naming
  • Entropy based feature extraction
  • Evaluation
  • SVM
  • Web directory
  • Web structure

Fingerprint

Dive into the research topics of 'Using web structure for classifying and describing web pages'. Together they form a unique fingerprint.

Cite this