The mathematical analysis of style: A correlation-based approach

Research output: Contribution to journalArticlepeer-review


Mathematical models of style have focused on features which are easily quantifiable and, for computer-aided analysis, easily identifiable by machines. Most such studies are based on frequency of occurrence of word counts, vocabulary items, or grammatical forms. In this paper, we examine not the number of occurrences of a characteristic, but the pattern in which the characteristic occurs. For example, we hypothesize that the lengths of successive sentences are mathematically correlated and that the length of a sentence can be described, quantitatively, in terms of the lengths of previous sentences. Autoregressive integrated moving average (ARIMA) models are traditionally used to describe correlated time series data. Under the assumption that the number of words in one sentence is correlated with the number of words per sentence in prior sentences, we develop ARIMA models for series in different works by the same author and comparable works by different authors (James Joyce: portions of Ulysses, and Dubliners; and Ernest Hemingway: portions of In Our Time). Problems of sampling from a literary text are discussed and results presented. Although the performance of the models in predicting sentence length is only marginally better than using mean sentence length, the potential value of this technique in characterizing stylistic features, especially changes in style from the beginning of a piece to the end, is demonstrated.

Original languageEnglish (US)
Pages (from-to)241-252
Number of pages12
JournalComputers and the Humanities
Issue number4
StatePublished - Dec 1988

All Science Journal Classification (ASJC) codes

  • Social Sciences(all)


  • ARIMA models
  • Box-Jenkins method
  • Ernest Hemingway
  • James Joyce
  • autocorrelation
  • correlation
  • literature
  • sentence length
  • stylistic analysis

Fingerprint Dive into the research topics of 'The mathematical analysis of style: A correlation-based approach'. Together they form a unique fingerprint.

Cite this