TY - JOUR

T1 - Exact probability of fixed patterns occurring in a random sequence

AU - Sheng, Ke Ning

AU - Naus, Joseph I.

PY - 2020

Y1 - 2020

N2 - We derive a procedure to obtain the exact probability that a specific pattern of letters occurs in a longer random sequence of letters. The procedure is generalized to find the exact probability of a fixed (specific) single pattern, and a union or intersection of multiple fixed (specific) patterns within a random sequence perfectly for any distributions of a cell in the random sequence, and can handle patterns with uncertain letters (including missing, blank, unclear, ambiguous, transposition, etc.). The procedure also finds the probability that a pattern that is randomly picked will appear in a separate longer random sequence of letters. These methods are of particular applicability in genetic sequence analysis, diagnostics, anthropology, clinical medicine, data mining, computational molecular biology, and pattern analysis and recognition.

AB - We derive a procedure to obtain the exact probability that a specific pattern of letters occurs in a longer random sequence of letters. The procedure is generalized to find the exact probability of a fixed (specific) single pattern, and a union or intersection of multiple fixed (specific) patterns within a random sequence perfectly for any distributions of a cell in the random sequence, and can handle patterns with uncertain letters (including missing, blank, unclear, ambiguous, transposition, etc.). The procedure also finds the probability that a pattern that is randomly picked will appear in a separate longer random sequence of letters. These methods are of particular applicability in genetic sequence analysis, diagnostics, anthropology, clinical medicine, data mining, computational molecular biology, and pattern analysis and recognition.

KW - Partially occurring group

KW - Pattern recognition

KW - Recurrence relation

KW - Relation equation

UR - http://www.scopus.com/inward/record.url?scp=85087490746&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85087490746&partnerID=8YFLogxK

U2 - 10.1080/03610918.2020.1766500

DO - 10.1080/03610918.2020.1766500

M3 - Review article

AN - SCOPUS:85087490746

JO - Communications in Statistics Part B: Simulation and Computation

JF - Communications in Statistics Part B: Simulation and Computation

SN - 0361-0918

ER -