Disallow and Allow directives
- Disallow
- Allow
- Combining directives
- Allow and Disallow directives without parameters
- Using the special characters * and $
- Examples of how directives are interpreted
Disallow
- Pages that contain confidential data.
- Pages with site search results.
- Site traffic statistics.
- Duplicate pages.
- Various logs.
- Database service pages.
Examples:
User-agent: Yandex
Disallow: / # prohibits crawling for the entire site
User-agent: Yandex
Disallow: /catalogue # prohibits crawling the pages that start with /catalogue
User-agent: Yandex
Disallow: /page? # Prohibits crawling the pages with a URL that contains parameters
Allow
This directive allows indexing site sections or individual pages.
Examples:
User-agent: Yandex
Allow: /cgi-bin
Disallow: /
# prohibits downloading anything except pages # starting with '/cgi-bin'
User-agent: Yandex
Allow: /file.xml
# allows downloading the file.xml file
User-agent
, Disallow
and Allow
directives.Combining directives
The Allow
and Disallow
directives from the corresponding User-agent
block are sorted according to URL prefix length (from shortest to longest) and applied in order. If several directives match a particular site page, the robot selects the last one in the sorted list. This way the order of directives in the robots.txt file doesn't affect the way they are used by the robot.
Allow
directive takes precedence.# Source robots.txt:
User-agent: Yandex
Allow: /
Allow: /catalog/auto
Disallow: /catalog
# Sorted robots.txt:
User-agent: Yandex
Allow: /
Disallow: /catalog
Allow: /catalog/auto
# prohibits downloading pages starting with '/catalog',
# but allows downloading pages starting with '/catalog/auto'.
Common example:
User-agent: Yandex
Allow: /archive
Disallow: /
# allows everything that contains '/archive', everything else is prohibited
User-agent: Yandex
Allow: /obsolete/private/*.html$ # allows HTML files # in the '/obsolete/private/...' path
Disallow: /*.php$ # prohibits all '*.php' on the website
Disallow: /*/private/ # prohibits all subpaths containing
# '/private/', but the Allow above negates
# a part of this prohibition Disallow: /*/old/*.zip$ # prohibits all '*.zip' files containing
# in the path '/old/'
User-agent: Yandex
Disallow: /add.php?*user=
# prohibits all 'add.php?' scripts with the 'user' option
Allow and Disallow directives without parameters
If directives don't contain parameters, the robot handles the data as follows:
User-agent: Yandex
Disallow: # same as Allow: /
User-agent: Yandex
Allow: # isn't taken into account by the robot
Using the special characters * and $
You can use special characters when specifying the paths of the Allow and Disallow directives * and $ to set certain regular expressions.
The * character indicates any sequence of characters (or none). Examples:
User-agent: Yandex
Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'
# and '/cgi-bin/private/test.aspx'
Disallow: /*private # prohibits both '/private'
# and '/cgi-bin/private'
By default, the * character is appended to the end of every rule described in the robots.txt file. Example:
User-agent: Yandex
Disallow: /cgi-bin* # blocks access to pages
# starting with '/cgi-bin'
Disallow: /cgi-bin # the same
To cancel * at the end of the rule, use the $ character, for example:
User-agent: Yandex
Disallow: /example$ # prohibits '/example',
# but allows '/example.html'
User-agent: Yandex
Disallow: /example # prohibits both '/example'
# and '/example.html'
The $ character doesn't forbid the * at the end, that is:
User-agent: Yandex
Disallow: /example$ # only prohibits '/example'
Disallow: /example*$ # same as 'Disallow: /example'
# prohibits both /example.html and /example
Examples of how directives are interpreted
User-agent: Yandex
Allow: /
Disallow: /
# everything is allowed
User-agent: Yandex
Allow: /$
Disallow: /
# everything is prohibited except the main page
User-agent: Yandex
Disallow: /private*html
# prohibits '/private*html',
# '/private/test.html', '/private/html/test.aspx', etc. User-agent: Yandex
Disallow: /private$
# prohibits only '/private'
User-agent: *
Disallow: /
User-agent: Yandex
Allow: /
# since the Yandex robot # selects entries that include 'User-agent:',
# everything is allowed