Skip to content

regexp/unicode-property

🔧 This rule is automatically fixable by the --fix CLI option.

enforce consistent naming of unicode properties

📖 Rule Details

This rule helps to enforce consistent style and naming of unicode properties.

There are many ways a single Unicode property can be expressed. E.g. \p{L}, \p{Letter}, \p{gc=L}, \p{gc=Letter}, \p{General_Category=L}, and \p{General_Category=Letter} are all equivalent. This rule can be configured in a variety of ways to control exactly which ones of those variants are allowed. The default configuration is intended to be a good starting point for most users.

Now loading...

🔧 Options

json
{
  "regexp/unicode-property": ["error", {
    "generalCategory": "never",
    "key": "ignore",
    "property": {
      "binary": "ignore",
      "generalCategory": "ignore",
      "script": "long",
    }
  }]
}

generalCategory: "never" | "always" | "ignore"

Values from the General_Category property can be expressed in two ways: either without or with the gc= (or General_Category=) prefix. E.g. \p{Letter} or \p{gc=Letter}.

This option controls whether the gc= prefix is required or forbidden.

  • "never" (default): The gc= (or General_Category=) prefix is forbidden.

    Now loading...
  • "always": The gc= (or General_Category=) prefix is required.

    Now loading...
  • "ignore": Both with and without prefix is allowed.

    Now loading...

key: "short" | "long" | "ignore"

Unicode properties in key-value form (e.g. \p{gc=Letter}, \P{scx=Greek}) have two variants for the key: a short and a long form. E.g. \p{gc=Letter} and \p{General_Category=Letter}.

This option controls whether the short or long form is required.

  • "short": The key must be in short form.

    Now loading...
  • "long": The key must be in long form.

    Now loading...
  • "ignore" (default): The key can be in either form.

    Now loading...

property: "short" | "long" | "ignore" | object

Similar to key, most property names also have long and short forms. E.g. \p{Letter} and \p{L}.

This option controls whether the short or long form is required. Which forms is required can be configured for each property type via an object. The object has to be of the type:

ts
{
  binary?: "short" | "long" | "ignore",
  generalCategory?: "short" | "long" | "ignore",
  script?: "short" | "long" | "ignore",
}
  • binary controls the form of Binary Unicode properties. E.g. ASCII, Any, Hex.
  • generalCategory controls the form of values from the General_Category property. E.g. Letter, Ll, P.
  • script controls the form of values from the Script and Script_Extensions properties. E.g. Greek.

If the option is set to a string instead of an object, it will be used for all property types.

NOTE: The "short" and "long" options follow the Unicode standard for short and long names. However, short names aren't always shorter than long names. E.g. the short name for p{sc=Han} is \p{sc=Hani}.

There are also some properties that don't have a short name, such as \p{sc=Thai}, and some that have additional aliases that can be longer than the long name, such as \p{Mark} (long) with its short name \p{M} and alias \p{Combining_Mark}.

Examples

All set to "long":

Now loading...

All set to "short":

Now loading...

Binary properties and values of the General_Category property set to "short" and values of the Script property set to "long":

Now loading...

📚 Further reading

🚀 Version

This rule was introduced in eslint-plugin-regexp v2.5.0

🔍 Implementation