Exact probability of fixed patterns occurring in a random sequence

Ke Ning Sheng, Joseph I. Naus

Research output: Contribution to journalReview articlepeer-review

Abstract

We derive a procedure to obtain the exact probability that a specific pattern of letters occurs in a longer random sequence of letters. The procedure is generalized to find the exact probability of a fixed (specific) single pattern, and a union or intersection of multiple fixed (specific) patterns within a random sequence perfectly for any distributions of a cell in the random sequence, and can handle patterns with uncertain letters (including missing, blank, unclear, ambiguous, transposition, etc.). The procedure also finds the probability that a pattern that is randomly picked will appear in a separate longer random sequence of letters. These methods are of particular applicability in genetic sequence analysis, diagnostics, anthropology, clinical medicine, data mining, computational molecular biology, and pattern analysis and recognition.

Original languageEnglish (US)
JournalCommunications in Statistics: Simulation and Computation
DOIs
StateAccepted/In press - 2020

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Modeling and Simulation

Keywords

  • Partially occurring group
  • Pattern recognition
  • Recurrence relation
  • Relation equation

Fingerprint

Dive into the research topics of 'Exact probability of fixed patterns occurring in a random sequence'. Together they form a unique fingerprint.

Cite this