A Comparison of LSA and LDA for the Analysis of Railroad Accident Text

Trefor Williams, John Betak

Research output: Contribution to journalConference articlepeer-review

30 Scopus citations


Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation(LDA) were used to identify themes in a database of text about railroad equipment accidents maintained by the Federal Railroad Administration in the United States. These text mining techniques use different mechanisms to identify topics. LDA and LSA identified switching accidents, hump yard accidents and grade crossing accidents as major accident type topics. LSA identified accidents with track maintenance equipment as a topic. Both text mining models identified accidents with tractor-trailer highway trucks as a particular problem at grade crossings. It was found that the use of the two techniques was complementary, with more accident topics identified than with the use of a single method.

Original languageEnglish (US)
Pages (from-to)98-102
Number of pages5
JournalProcedia Computer Science
StatePublished - 2018
Event9th International Conference on Ambient Systems, Networks and Technologies, ANT 2018 - Porto, Indonesia
Duration: May 8 2018May 11 2018

All Science Journal Classification (ASJC) codes

  • General Computer Science


  • accidents
  • railroad
  • text mining


Dive into the research topics of 'A Comparison of LSA and LDA for the Analysis of Railroad Accident Text'. Together they form a unique fingerprint.

Cite this