# Regular expressions

You can use regular expressions to filter data by URL.

The expression is processed according to RE2 syntax and the following rules:

- The regular expression is applied to the page's full URL, including protocol and domain. For example, you can use the regular expression:
`^http://`

. - The regular expression is applied twice: first to the original URL, and then to the URL with and without the
`www`

prefix. This means that the results do not depend on whether the`www`

prefix is included in the domain. - The regular expression is applied to the decoded URL, in which URL escape codes (% sequences) are replaced with decoded characters (exception: character codes
`/`

,`&`

,`=`

,`?`

,`#`

are not replaced; for example,`%2F`

will not be replaced with`/`

). It is important to bear in mind that the plus sign (`+`

) is replaced with a space when decoding. For example, the regular expression`text=elephant`

will be processed, but`text=%D1%81%D0%BB%D0%BE%D0%BD`

and`text=%\w\w`

will not. - Punycode is not applied to Cyrillic URLs. For example, the regular expression
`^http://ввв\.сайт\.рф/`

will be processed, but`^http://xn--b1aaa\.xn--80aswg\.xn--p1ai/`

will not. - Before checking regular expressions, certain symbols are removed from the end of the URL:
`?`

,`#`

,`&`

, and the dot (`.`

). For example, for the URLs`http://example.com/?`

,`http://example.com/#`

, and`http://example.com/?var=1&`

, the comparison will be made with`http://example.com/`

,`http://example.com/`

, and`http://example.com/?var=1`

, respectively. If the user enters the URL`http://example.com./`

, the regular expression`\./$`

will not be processed. - When checking regular expressions, the characters in URLs are case-sensitive.
- Quantifiers match the longest possible string when checking expressions.

## Instructions on regular expressions

In the table below, `a`

, `b`

, `c`

, `d`

, and `e`

are any characters, and `n`

and `m`

are whole positive integers.

Alternative variants | |
---|---|

abc|de | Matches one of the variants: `abc` or `de` . |

Character classes | |

[abc] or [a-c] | Matches any (one) character from those listed (or from the range). |

[^abc] or [^a-c] | Matches any (one) character that is not listed (or does not fall within the range). |

\d | Matches a digit. Equivalent to `[0-9]` . |

\D | Matches a non-digit. Equivalent to `[^0-9]` . |

\s | Matches a space. Equivalent to `[\t\n\f\r ]` . |

\S | Matches any character that is not a space. Equivalent to `[^\t\n\f\r ]` . |

\pL | Matches any Unicode character. |

\w |
Matches an uppercase or lowercase When working with Unicode characters, use the |

\W |
Matches any character When working with Unicode characters, use the |

Number of occurrences (quantifiers) | |

a* | Matches the character `a` repeated 0 or more times (the longest of possible sequences is selected). |

a+ | Matches the character `a` repeated 1 or more times (the longest of possible sequences is selected). |

a? | Matches the character `a` repeated 0 or 1 time (priority is given to the character's occurrence). |

a{n,m} | Matches the character `a` repeated no less than `n` and no more than `m` times (the longest of possible sequences is selected). |

a{n,} | Matches the character `a` repeated no less than `n` times (the longest of possible sequences is selected). |

a{n} | Matches the character `a` repeated exactly `n` times. |

a*? | Matches the character `a` repeated 0 or more times (the shortest of possible sequences is selected). |

a+? | Matches the character `a` repeated 1 or more times (the shortest of possible sequences is selected). |

a?? | Matches the character a repeated 0 or 1 time (priority is given to the character's absence). |

a{n,m}? | Matches the character `a` repeated no less than `n` and no more than `m` times (the shortest of possible sequences is selected). |

a{n,}? | Matches the character `a` repeated no less than `n` times (the shortest of possible sequences is selected). |

Position within the string | |

^ | Matches the beginning of the string. |

$ | Matches the end of the string. |

\b |
Matches a word boundary — the position between an alphanumeric character ( |

\B |
Matches the absence of a word boundary. Defined through the classes |

Escape sequences | |

\ |
A backslash before one of the special characters Example: |

\Q...\E | All special characters in the interval between `\Q` and `\E` are interpreted as regular characters. |

Alternative variants | |
---|---|

abc|de | Matches one of the variants: `abc` or `de` . |

Character classes | |

[abc] or [a-c] | Matches any (one) character from those listed (or from the range). |

[^abc] or [^a-c] | Matches any (one) character that is not listed (or does not fall within the range). |

\d | Matches a digit. Equivalent to `[0-9]` . |

\D | Matches a non-digit. Equivalent to `[^0-9]` . |

\s | Matches a space. Equivalent to `[\t\n\f\r ]` . |

\S | Matches any character that is not a space. Equivalent to `[^\t\n\f\r ]` . |

\pL | Matches any Unicode character. |

\w |
Matches an uppercase or lowercase When working with Unicode characters, use the |

\W |
Matches any character When working with Unicode characters, use the |

Number of occurrences (quantifiers) | |

a* | Matches the character `a` repeated 0 or more times (the longest of possible sequences is selected). |

a+ | Matches the character `a` repeated 1 or more times (the longest of possible sequences is selected). |

a? | Matches the character `a` repeated 0 or 1 time (priority is given to the character's occurrence). |

a{n,m} | Matches the character `a` repeated no less than `n` and no more than `m` times (the longest of possible sequences is selected). |

a{n,} | Matches the character `a` repeated no less than `n` times (the longest of possible sequences is selected). |

a{n} | Matches the character `a` repeated exactly `n` times. |

a*? | Matches the character `a` repeated 0 or more times (the shortest of possible sequences is selected). |

a+? | Matches the character `a` repeated 1 or more times (the shortest of possible sequences is selected). |

a?? | Matches the character a repeated 0 or 1 time (priority is given to the character's absence). |

a{n,m}? | Matches the character `a` repeated no less than `n` and no more than `m` times (the shortest of possible sequences is selected). |

a{n,}? | Matches the character `a` repeated no less than `n` times (the shortest of possible sequences is selected). |

Position within the string | |

^ | Matches the beginning of the string. |

$ | Matches the end of the string. |

\b |
Matches a word boundary — the position between an alphanumeric character ( |

\B |
Matches the absence of a word boundary. Defined through the classes |

Escape sequences | |

\ |
A backslash before one of the special characters Example: |

\Q...\E | All special characters in the interval between `\Q` and `\E` are interpreted as regular characters. |