Regular expressions
Regular expressions can be used to filter the URL data in Yandex Webmaster:
Expressions are parsed according to the RE2 syntax and the following rules:
- The regular expression is applied to the entire URL of the page including the protocol and domain. For example, you can use the following regular expression:
^http://
. - A regular expression is applied twice: to the original URL and the URL with the
www
prefix and without it. The presence of thewww
prefix in the domain doesn't affect the result of expression validation. - The regular expression is applied to the decoded URL where the URL codes (% sequences) are replaced with decoded characters. Exception: the codes for the
/
,&
,=
,?
, and#
characters aren't replaced. For example,%2F
isn't replaced with/
. Note that the+
character is replaced with a space. For example, the regular expressiontext=elephant
will be processed, buttext=%D1%81%D0%BB%D0%BE%D0%BD
andtext=%\w\w
won't. - Cyrillic URL doesn't use punycode. For example, the regular expression
^http://ввв\.сайт\.рф/
will be processed, but^http://xn--b1aaa\.xn--80aswg\.xn--p1ai/
won't. - Some characters are excluded from the URL ending before the regular expressions check:
?
,#
,&
, as well as period (.
). For example, the URLshttp://example.com/?
,http://example.com/#
,http://example.com/?var=1&
are compared withhttp://example.com/
,http://example.com/
,http://example.com/?var=1
respectively. If the user enters the URLhttp://example.com./
, the regular expression\./$
isn't processed. - In the checked regular expressions, quantifiers match as many characters as possible.
- The URL characters are case-sensitive.
Regular expressions memo
In the table below, a
, b
, c
, d
, e
are any characters, n
, m
are positive numbers.
Possible options |
|
abc|de |
Matches one of the options: |
Classes of characters |
|
[abc] or [a-c] |
Matches any (one) character of the list (or from the range). |
[^abc] or [^a-c] |
Matches any (one) character except those listed (or those from the range). |
\d |
Matches a digit character. Equivalent to |
\D |
Matches a non-digit character. Equivalent to |
\s |
Matches a space character. Equivalent to |
\S |
Matches a non-white-space character. Equivalent to |
\pL |
|
\w |
Matches any Latin letter of any case, digit or the underscore character. When working with the Unicode characters, use the |
\W |
Matches any character other than a Latin letter of any case, a digit or an underscore. When working with the Unicode characters, use the |
Number of occurrences (quantifiers) |
|
a* |
Matches the |
a+ |
Matches the |
a? |
Matches the |
a{n,m} |
Matches the |
a{n,} |
Matches the |
a{n} |
Matches the |
a*? |
Matches the |
a+? |
Matches the |
a?? |
Matches the a character repeated 0 or 1 time (the presence of the character is a priority). |
a{n,m}? |
Matches the |
a{n,}? |
Matches the |
Position in the line: |
|
^ |
Matches the beginning of a string. |
$ |
Matches the end of a string. |
\b |
Matches the word boundary — the position between the alphanumeric character ( |
\B |
Matches a non-word boundary. Defined through the |
Escaping |
|
\ |
A backslash before a [ ] \ ^ $ . | ? * + ( ) { } special character means that this character is not special and should be interpreted literally. Example: |
\Q...\E |
All special characters between |