Protecting Your Application from Regular Expression Denial-of-Service Attacks
What is ReDoS?
Regular expression denial-of-service (ReDoS) attacks are a type of denial-of-service attack that exploits vulnerabilities in regular expression engines. The goal of a ReDoS attack is to overwhelm application or server resources, making them inaccessible to users. According to a 2019 Snyk report, ReDoS attacks are on the rise, with Node.js apps being among the most affected.
How Do Regular Expressions Work?
To understand how ReDoS attacks work, let’s quickly review how regular expression matching works. Regular expression pattern matching can be done by building a finite state machine. This abstract machine takes a set of inputs and performs operations on that input to produce a specified output. A finite state machine can be in exactly one of a limited number of states at any given time. A transition happens when the machine changes from one state to another.
The Problem with Backtracking
Regular expression engines use two problematic algorithms to determine the next states: backtracking and conversion to deterministic automation. Backtracking tries all possible paths until a match is found or all routes are tried and failed. This can lead to exponential time complexity. Conversion to deterministic automation can also take exponential time to finish, depending on the execution path.
Susceptible Regex Patterns
Certain regular expression patterns are more susceptible to ReDoS attacks. For example, the pattern ^(\w+\s?)*$
takes a group of words with an optional space after each word. This pattern can lead to catastrophic backtracking, causing the engine to take an exponential number of paths and leading to poor performance.
Safeguarding Your Patterns
To protect your regular expression patterns from ReDoS attacks, there are several strategies you can use:
Reduce Combinations
Avoid using nested quantifiers and overlapping clauses, which can lead to exponential time complexity.
Control Backtracking
Use features like atomic groups and lookahead to limit or suppress backtracking. Lookahead assertions can enforce that the quantifier does not backtrack, leading to improved performance.
Rewriting Patterns
Rewrite patterns to avoid backtracking. For example, you can rewrite the pattern ^(\w+\s?)*$
to (?=(\w+))\1
, which uses lookahead to match the longest word at the current position and references it later on.
By understanding how regular expression engines work and taking steps to safeguard your patterns, you can protect your application from ReDoS attacks and ensure better performance and security.