Regular expressions
Regular expressions can be used to filter the URL data in Yandex.Webmaster:
Expressions are parsed according to the RE2 syntax and the following rules:
- The regular expression is applied to the entire URL of the page including the protocol and domain. For example, you can use the following regular expression:
^http://
. - A regular expression is applied twice: to the original URL and the URL with the
www
prefix and without it. The presence of thewww
prefix in the domain doesn't affect the result of expression validation. - The regular expression is applied to the decoded URL where the URL codes (% sequences) are replaced with decoded characters. Exception: the codes for the
/
,&
,=
,?
, and#
characters aren't replaced. For example,%2F
isn't replaced with/
. Note that the+
character is replaced with a space. For example, the regular expressiontext=elephant
will be processed, buttext=%D1%81%D0%BB%D0%BE%D0%BD
andtext=%\w\w
won't. - Cyrillic URL doesn't use punycode. For example, the regular expression
^http://ввв\.сайт\.рф/
will be processed, but^http://xn--b1aaa\.xn--80aswg\.xn--p1ai/
won't. - Some characters are excluded from the URL ending before the regular expressions check:
?
,#
,&
, as well as period (.
). For example, the URLshttp://example.com/?
,http://example.com/#
,http://example.com/?var=1&
are compared withhttp://example.com/
,http://example.com/
,http://example.com/?var=1
respectively. If the user enters the URLhttp://example.com./
, the regular expression\./$
isn't processed. - In the checked regular expressions, quantifiers match as many characters as possible.
- The URL characters are case-sensitive.
Regular expressions memo
In the table below, a
, b
, c
, d
, e
are any characters, n
, m
are positive numbers.
Possible options | |
---|---|
abc|de | Matches one of the options: abc or de . |
Classes of characters | |
[abc] or [a-c] | Matches any (one) character of the list (or from the range). |
[^abc] or [^a-c] | Matches any (one) character except those listed (or those from the range). |
\d | Matches a digit character. Equivalent to [0-9] . |
\D | Matches a non-digit character. Equivalent to [^0-9] . |
\s | Matches a space character. Equivalent to [\t\n\f\r] . |
\S | Matches a non-white-space character. Equivalent to [^\t\n\f\r] . |
\pL | Matches any Unicode character. |
\w | Matches any Latin letter of any case, digit or the underscore character. When working with the Unicode characters, use the |
\W | Matches any character other than a Latin letter of any case, a digit or an underscore. When working with the Unicode characters, use the |
Number of occurrences (quantifiers) | |
a* | Matches the a character repeated 0 or more times (the longest possible sequence). |
a+ | Matches the a character repeated 1 or more times (the longest possible sequence). |
a? | Matches the a character repeated 0 or 1 time (the presence of the character is a priority). |
a{n,m} | Matches the a character repeated at least n times and not more than m times (the longest possible sequence). |
a{n,} | Matches the a character repeated at least n times (the longest possible sequence). |
a{n} | Matches the a character repeated n times. |
a*? | Matches the a character repeated 0 or more times (the shortest possible sequence). |
a+? | Matches the a character repeated 1 or more times (the shortest possible sequence). |
a?? | Matches the a character repeated 0 or 1 time (the presence of the character is a priority). |
a{n,m}? | Matches the a character repeated at least n times and not more than m times (the longest possible sequence). |
a{n,}? | Matches the a character repeated at least n times (the shortest possible sequence). |
Position in the line: | |
^ | Matches the beginning of a string. |
$ | Matches the end of a string. |
\b | Matches the word boundary — the position between the alphanumeric character ( |
\B | Matches a non-word boundary. Defined through the |
Escaping | |
\ | Reverse slash before the Example: |
\Q...\E | All special characters between \Q and\E are interpreted as common characters. |
Possible options | |
---|---|
abc|de | Matches one of the options: abc or de . |
Classes of characters | |
[abc] or [a-c] | Matches any (one) character of the list (or from the range). |
[^abc] or [^a-c] | Matches any (one) character except those listed (or those from the range). |
\d | Matches a digit character. Equivalent to [0-9] . |
\D | Matches a non-digit character. Equivalent to [^0-9] . |
\s | Matches a space character. Equivalent to [\t\n\f\r] . |
\S | Matches a non-white-space character. Equivalent to [^\t\n\f\r] . |
\pL | Matches any Unicode character. |
\w | Matches any Latin letter of any case, digit or the underscore character. When working with the Unicode characters, use the |
\W | Matches any character other than a Latin letter of any case, a digit or an underscore. When working with the Unicode characters, use the |
Number of occurrences (quantifiers) | |
a* | Matches the a character repeated 0 or more times (the longest possible sequence). |
a+ | Matches the a character repeated 1 or more times (the longest possible sequence). |
a? | Matches the a character repeated 0 or 1 time (the presence of the character is a priority). |
a{n,m} | Matches the a character repeated at least n times and not more than m times (the longest possible sequence). |
a{n,} | Matches the a character repeated at least n times (the longest possible sequence). |
a{n} | Matches the a character repeated n times. |
a*? | Matches the a character repeated 0 or more times (the shortest possible sequence). |
a+? | Matches the a character repeated 1 or more times (the shortest possible sequence). |
a?? | Matches the a character repeated 0 or 1 time (the presence of the character is a priority). |
a{n,m}? | Matches the a character repeated at least n times and not more than m times (the longest possible sequence). |
a{n,}? | Matches the a character repeated at least n times (the shortest possible sequence). |
Position in the line: | |
^ | Matches the beginning of a string. |
$ | Matches the end of a string. |
\b | Matches the word boundary — the position between the alphanumeric character ( |
\B | Matches a non-word boundary. Defined through the |
Escaping | |
\ | Reverse slash before the Example: |
\Q...\E | All special characters between \Q and\E are interpreted as common characters. |