Regular expressions
Regular expressions can be used to filter the URL data in Yandex Webmaster:
Expressions are parsed according to the RE2 syntax and the following rules:
- The regular expression is applied to the entire URL of the page including the protocol and domain. For example, you can use the following regular expression:
^http://. - A regular expression is applied twice: to the original URL and the URL with the
wwwprefix and without it. The presence of thewwwprefix in the domain doesn't affect the result of expression validation. - The regular expression is applied to the decoded URL where the URL codes (% sequences) are replaced with decoded characters. Exception: the codes for the
/,&,=,?, and#characters aren't replaced. For example,%2Fisn't replaced with/. Note that the+character is replaced with a space. For example, the regular expressiontext=elephantwill be processed, buttext=%D1%81%D0%BB%D0%BE%D0%BDandtext=%\w\wwon't. - Cyrillic URL doesn't use punycode. For example, the regular expression
^http://ввв\.сайт\.рф/will be processed, but^http://xn--b1aaa\.xn--80aswg\.xn--p1ai/won't. - Some characters are excluded from the URL ending before the regular expressions check:
?,#,&, as well as period (.). For example, the URLshttp://example.com/?,http://example.com/#,http://example.com/?var=1&are compared withhttp://example.com/,http://example.com/,http://example.com/?var=1respectively. If the user enters the URLhttp://example.com./, the regular expression\./$isn't processed. - In the checked regular expressions, quantifiers match as many characters as possible.
- The URL characters are case-sensitive.
Regular expressions memo
In the table below, a, b, c, d, e are any characters, n, m are positive numbers.
|
Possible options |
|
|
abc|de |
Matches one of the options: |
|
Classes of characters |
|
|
[abc] or [a-c] |
Matches any (one) character of the list (or from the range). |
|
[^abc] or [^a-c] |
Matches any (one) character except those listed (or those from the range). |
|
\d |
Matches a digit character. Equivalent to |
|
\D |
Matches a non-digit character. Equivalent to |
|
\s |
Matches a space character. Equivalent to |
|
\S |
Matches a non-white-space character. Equivalent to |
|
\pL |
Matches any Unicode character. |
|
\w |
Matches any Latin letter of any case, digit or the underscore character. When working with the Unicode characters, use the |
|
\W |
Matches any character other than a Latin letter of any case, a digit or an underscore. When working with the Unicode characters, use the |
|
Number of occurrences (quantifiers) |
|
|
a* |
Matches the |
|
a+ |
Matches the |
|
a? |
Matches the |
|
a{n,m} |
Matches the |
|
a{n,} |
Matches the |
|
a{n} |
Matches the |
|
a*? |
Matches the |
|
a+? |
Matches the |
|
a?? |
Matches the a character repeated 0 or 1 time (the presence of the character is a priority). |
|
a{n,m}? |
Matches the |
|
a{n,}? |
Matches the |
|
Position in the line: |
|
|
^ |
Matches the beginning of a string. |
|
$ |
Matches the end of a string. |
|
\b |
Matches the word boundary — the position between the alphanumeric character ( |
|
\B |
Matches a non-word boundary. Defined through the |
|
Escaping |
|
|
\ |
A backslash before a [ ] \ ^ $ . | ? * + ( ) { } special character means that this character is not special and should be interpreted literally. Example: |
|
\Q...\E |
All special characters between |