TY - GEN
T1 - A novel algorithm for pattern matching with back references
AU - Yang, Liu
AU - Ganapathy, Vinod
AU - Manadhata, Pratyusa
AU - Wu, Ye
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/2/17
Y1 - 2016/2/17
N2 - Modern network security applications, such as network-based intrusion detection systems (NIDS) and firewalls, routinely employ deep packet inspection to identify malicious traffic. In deep packet inspection, the contents of network traffic are matched against patterns of malicious traffic to identify attack-carrying packets. The pattern matching algorithms employed for deep packet inspection must be fast, as the algorithms are often implemented on middle-boxes residing on high-speed gigabits per second links. The majority of patterns employed in network security applications are regular languages. However, regular language-based patterns have limited expressive power and are not capable of describing some complex features in network payload. Back reference is an important feature provided by many pattern matching tools, e.g., PCRE, the regular expression libraries of Java, Perl, and Python. Back references are used to identify repeated patterns in input strings. Patterns containing back-references are non-regular languages. Very little work has been done to improve the time-efficiency of back reference-based pattern matching. The de facto algorithm to implement back reference is recursive backtracking, but it is vulnerable to algorithmic complexity attacks. In this paper, we present a novel approach to implement back references. The basic idea of our approach is to transform a back reference problem to a conditional submatch problem, and represent it with a Non-deterministic Finite Automata (NFA)-like machine subject to some constraints. Our experimental results show that our approach resists known algorithmic complexity attacks, and is faster than PCRE by up to three orders of magnitude for certain types of patterns.
AB - Modern network security applications, such as network-based intrusion detection systems (NIDS) and firewalls, routinely employ deep packet inspection to identify malicious traffic. In deep packet inspection, the contents of network traffic are matched against patterns of malicious traffic to identify attack-carrying packets. The pattern matching algorithms employed for deep packet inspection must be fast, as the algorithms are often implemented on middle-boxes residing on high-speed gigabits per second links. The majority of patterns employed in network security applications are regular languages. However, regular language-based patterns have limited expressive power and are not capable of describing some complex features in network payload. Back reference is an important feature provided by many pattern matching tools, e.g., PCRE, the regular expression libraries of Java, Perl, and Python. Back references are used to identify repeated patterns in input strings. Patterns containing back-references are non-regular languages. Very little work has been done to improve the time-efficiency of back reference-based pattern matching. The de facto algorithm to implement back reference is recursive backtracking, but it is vulnerable to algorithmic complexity attacks. In this paper, we present a novel approach to implement back references. The basic idea of our approach is to transform a back reference problem to a conditional submatch problem, and represent it with a Non-deterministic Finite Automata (NFA)-like machine subject to some constraints. Our experimental results show that our approach resists known algorithmic complexity attacks, and is faster than PCRE by up to three orders of magnitude for certain types of patterns.
KW - Back reference
KW - Finite automaton
KW - Network-based Intrusion Detection System
KW - Pattern matching
UR - http://www.scopus.com/inward/record.url?scp=84969972269&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84969972269&partnerID=8YFLogxK
U2 - 10.1109/PCCC.2015.7410264
DO - 10.1109/PCCC.2015.7410264
M3 - Conference contribution
AN - SCOPUS:84969972269
T3 - 2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015
BT - 2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 34th IEEE International Performance Computing and Communications Conference, IPCCC 2015
Y2 - 14 December 2015 through 16 December 2015
ER -