Using robots.txt
Robots.txt is a text file that contains site indexing parameters for the search engine robots.
Yandex supports the Robots Exclusion Protocol with advanced features.
When crawling a site, the Yandex robot loads the robots.txt file. If the latest request to the file shows that a site page or section is prohibited, the robot won't index them.
Requirements to the robots.txt file
The file size doesn't exceed 500 KB.
- It is a TXT file named "robots", robots.txt.
- The file is located in the root directory of the site.
The file is available for robots: the server that hosts the site responds with an HTTP code with the status 200 OK. Check the server response
If the file doesn't meet the requirements, the site is considered open for indexing.
Yandex supports redirection from the robots.txt file located on one site to the file located on another site. In this case, the directives in the target file are taken into account. This redirect can be useful when moving the site.
Recommendations on the content of the file
Yandex supports the following directives:
Directive | What it does |
---|---|
User-agent * | Indicates the robot to which the rules listed in robots.txt apply. |
Disallow | Prohibits indexing site sections or individual pages. |
Sitemap | Specifies the path to the Sitemap file that is posted on the site. |
Clean-param | Indicates to the robot that the page URL contains parameters (like UTM tags) that should be ignored when indexing it. |
Allow | Allows indexing site sections or individual pages. |
Crawl-delay | Specifies the minimum interval (in seconds) for the search robot to wait after loading one page, before starting to load another. We recommend using the crawl speed setting in Yandex.Webmaster instead of the directive. |
Directive | What it does |
---|---|
User-agent * | Indicates the robot to which the rules listed in robots.txt apply. |
Disallow | Prohibits indexing site sections or individual pages. |
Sitemap | Specifies the path to the Sitemap file that is posted on the site. |
Clean-param | Indicates to the robot that the page URL contains parameters (like UTM tags) that should be ignored when indexing it. |
Allow | Allows indexing site sections or individual pages. |
Crawl-delay | Specifies the minimum interval (in seconds) for the search robot to wait after loading one page, before starting to load another. We recommend using the crawl speed setting in Yandex.Webmaster instead of the directive. |
* Mandatory directive.
You'll most often need the Disallow, Sitemap, and Clean-param directives. Examples:
User-agent: * #specifies the robots that the directives are set for
Disallow: /bin/ # prohibits links from the Shopping Cart.
Disallow: /search/ # prohibits page links of the search embedded on the website
Disallow: /admin/ # prohibits links from the admin panel
Sitemap: http://example.com/sitemap # specifies the path to the website's Sitemap file for the robot
Clean-param: ref /some_dir/get_book.pl
Robots from other search engines and services may interpret the directives in a different way.
Using Cyrillic characters
The use of the Cyrillic alphabet is not allowed in the robots.txt file and server HTTP headers.
For domain names, use Punycode. For page addresses, use the same encoding as that of the current site structure.
Example of the robots.txt file:
#Incorrect:
User-agent: Yandex
Disallow: /корзина
Sitemap: сайт.рф/sitemap.xml
#Correct:
User-agent: Yandex
Disallow: /%D0%BA%D0%BE%D1%80%D0%B7%D0%B8%D0%BD%D0%B0
Sitemap: http://xn--80aswg.xn--p1ai/sitemap.xml
How do I create robots.txt?
- In the text editor, create a file named robots.txt and add the directives you need in it.
- Check the file in Yandex.Webmaster.
- Place the file to your site's root directory.
Sample file. This file allows indexing of the entire site for all search engines.
FAQ
As a rule, after you set a ban on indexing using any of the available methods, pages are excluded from the search results within two weeks. You can speed up this process.
For the robots.txt file to be taken into account by the robot, it must be located in the root directory of the site and respond with HTTP 200 code. The indexing robot doesn't support the use of files hosted on other sites.
To check availability of the robots.txt file for the robot, check the server response.
If your robots.txt redirects to another robots.txt file (for example, when moving a site), Yandex takes into account the target robots.txt. Make sure that the correct directives are specified in this file. To check the file, add the target site in Yandex.Webmaster and verify your site management rights.