# regexp/no-super-linear-move

disallow quantifiers that cause quadratic moves

# 📖 Rule Details

This rule reports super-linear worst-case runtime caused by a regex being moved across the input string. The reported cases are a problem because the super-linear worst-case runtime can be exploited by attackers in what is called Regular expression Denial of Service - ReDoS (opens new window).

Now loading...

# Background

Regexes are often used to find text of within a string (e.g. /abc/.exec("123 abc def")). The position of the matching text is unknown and has to be determined by the regex engine. In practice, the regex engine will move the regex across the input string character by character. While there are many optimizations to skip parts of the input string, there will still be O(n) possible positions. If there is no text matching the regex in the input string, then all O(n) positions will be checked.

This is not problem in it self, O(n) is expected for linear string searching algorithms.

Problems arise when the regex itself takes more than O(1) steps (on average) to reject any position within the input.

Example: The regex /^a+b/ takes O(n) steps to find no match in an input of n-many as. However, the regex /a+b/ (no assertion) takes O(n) steps to find no match in the same input at position 0. It will take another O(n-1) steps at position 1, another O(n-2) steps at position 2, and so on. In total, /a+b/ will take O(n^2) steps to find no match in an input of n-many as.

If a regex is moved across the string and takes O(n) steps (on average) to reject each of the O(n) possible positions, then it will reject the input string in O(n^2) steps.

# Possible fixes

There are multiple ways to fix this kind of super-linear runtime.

However, there is no one-size-fits-all solution. Adequate testing and code review are necessary to ensure that the fixed regex is still correct.

# Change the quantifier

If the quantifier is preceded by an assertion, the quantifier might be too broad (= accept too many characters). Narrowing down the quantifier might fix the issue.

Example: /^\s*(\w+)\s*[:=]/m

This is a simple regex to find keys in a config file. Keys must be at the start of a line and may be surrounded by whitespace characters.

This rule says that the first \s* causes quadratic runtime for "any attack string /[\n\r\u2028\u2029]+/."

The problem with \s* is that \s also allows line break characters (the characters in the attack string). ^ already ensures the "start of a line" requirement, so there is no reason to allow line breaks after the ^.

The fix is to remove all line break characters from \s. This is difficult, so let's cheat a little and say that only spaces and tabs ([\t ]) are allows to surround the key.

Now loading...

# Limit the quantifier

All quantifiers reported by this rule are unbound (= maximum is infinite). This is because attackers need strings with a lengths >1000 character to exploit the quadratic runtime.

If the quantifier simply stops searching after some maximum number of steps, the quantifier isn't exploitable.

Note: The maximum of the quantifier should be reasonably small (typically <100). Choosing a large maximum (>1000) will cause the quantifier to be exploitable despite the limit.

Example: /((?:\\{2})*)(\\?)\|/g

This regex was used by minimatch (opens new window) to find escaped and unescaped | characters.

This rule says that the (?:\\{2})* causes quadratic runtime for "any attack string /(?:\\{2})+/."

The fix is to limit the (?:\\{2})* quantifier. minimatch limited it to at most 64 repetitions. You read more about this vulnerability here (opens new window).

Now loading...

# Add an assertion

Adding a lookbehind, \b, or ^ assertion at the start of the pattern can solve a lot of issues.

This fix is typically only applicable if the exploitable quantifier can consume exactly one character per iteration. Quantifiers that can consume more than character per iteration (e.g. (?:abc)+, (?:ab?)+) are very difficult to fix with this approach.

Example: /[a-z_][a-z_0-9]*(?=\s*\()/i

This is a simple regex to find the names of functions and function-like macros in a C program.

This rule says that the first [a-z_0-9]* causes quadratic runtime for "any attack string /[A-Z_]+/i."

The problem is that [a-z_][a-z_0-9]* isn't guaranteed to start at the start of the function name, the [a-z_] can also match anywhere inside the function name.

The fix is to add an assertion to make sure that [a-z_] matches the first character of a name. We can use the lookbehind (?<![a-z_0-9]) == (?<!\w). In this case, it's also possible to use the built-in \b assertion.

Note that using (?<![a-z_]) is not enough. (?<![a-z_])[a-z_] can still match in the middle of the name for names with numbers (e.g. str2int). The lookbehind has to disallow the characters of the quantifier [a-z_0-9]*.

Now loading...

# Limitations of this rule

This rule implements a simple detection method. It is unable to find certain cases.

This means that this rule might not be able to verify fixed regexes. This rule might be unable to detect that supposedly fixed regexes are actually still vulnerable.

# 🔧 Options

{
  "regexp/no-super-linear-move": ["error", {
    "report": "certain",
    "ignoreSticky": false
  }]
}

# report: "certain" | "potential"

This option has the same function as the report option of regexp/no-super-linear-backtracking. The default value is "certain".

# ignoreSticky: boolean

By default, this rule ignores regexes with the sticky (y) flag. These regexes do not move across the input string on their own and they are mostly immune to this type of super-linear worst-case because of that. However, some algorithms (and even built-in JavaScript functions) manually move regexes across the string and others change the flags of regexes.

This option determines whether this rule will ignore regexes with sticky (y) flag.

  • ignoreSticky: true (default)

    Regexes with the sticky (y) flag will be ignored.

  • ignoreSticky: false

    All regexes will be analysed.

# ignorePartial: boolean:

Some regexes are used as fragments to build more complex regexes. Example:

const fn = /\w+(?=\s*\()/.source;
const pattern = RegExp(`#\\s*define\\s+${fn}`, "g");

Even if a fragment had exploitable quantifiers, it might not cause super-linear runtime depending on how the fragment is used.

  • ignorePartial: true (default)

    The rule does not check regexes used as a fragment. It assumes that fragments are used in a way such that super-linear runtime caused by moves is prevented.

  • ignorePartial: false

    The rule checks all regexes regardless of how they are used.

# 📚 Further reading

# 🚀 Version

This rule was introduced in eslint-plugin-regexp v0.13.0

# 🔍 Implementation

Last Updated: 6/25/2022, 12:32:38 PM