Eye movements while viewing narrated, captioned, and silent videos

Nicholas M. Ross, Eileen Kowler

Research output: Contribution to journalArticlepeer-review

26 Scopus citations


Videos are often accompanied by narration delivered either by an audio stream or by captions, yet little is known about saccadic patterns while viewing narrated video displays. Eye movements were recorded while viewing video clips with (a) audio narration, (b) captions, (c) no narration, or (d) concurrent captions and audio. A surprisingly large proportion of time (.40%) was spent reading captions even in the presence of a redundant audio stream. Redundant audio did not affect the saccadic reading patterns but did lead to skipping of some portions of the captions and to delays of saccades made into the caption region. In the absence of captions, fixations were drawn to regions with a high density of information, such as the central region of the display, and to regions with high levels of temporal change (actions and events), regardless of the presence of narration. The strong attraction to captions, with or without redundant audio, raises the question of what determines how time is apportioned between captions and video regions so as to minimize information loss. The strategies of apportioning time may be based on several factors, including the inherent attraction of the line of sight to any available text, the moment by moment impressions of the relative importance of the information in the caption and the video, and the drive to integrate visual text accompanied by audio into a single narrative stream.

Original languageEnglish (US)
Article number1
JournalJournal of vision
Issue number4
StatePublished - 2013

All Science Journal Classification (ASJC) codes

  • Ophthalmology
  • Sensory Systems


  • Captions
  • Cognition
  • Event perception
  • Eye movements
  • Movies
  • Multi-sensory integration
  • Narration
  • Reading
  • Saccades
  • Saccadic eye movements
  • Salience models
  • Videos


Dive into the research topics of 'Eye movements while viewing narrated, captioned, and silent videos'. Together they form a unique fingerprint.

Cite this